SC08 Cluster Challenge Rules

The Cluster Challenge will showcase the amazing power of clusters and the ability to harness open source software to solve interesting and important problems. Teams will compete in real time on the exhibit floor, where they will run a workload of real-world problems on clusters of their own design. The winning team will be chosen based on workload accomplished, benchmark performance and overall knowledge and presentation.

Teams

A team consists of up to six students, a supervisor, and optional vendor partners. Students must not have been granted any academic degree prior to the start of the contest. Team members must agree to a number of safety rules for the event.

The supervisor must be an employee of the team’s educational institution, is responsible for the team at all times, and must be available 24 hours a day during the contest. The supervisor is not allowed to provide technical assistance, but is encouraged to run for fuel (pizza and soda) for the team.

Teams are encouraged to join with one or more vendors, who may support team activities with equipment, training and financial support. Some conference travel funds may be available.

If you are a team in search of a vendor, or vendor in search of a team, please contact us.

Proposals

Team selection will be based on the proposal submitted by the team and will be judged by a panel of high performance computing experts from industry, academia, and the national laboratories. We are looking for hardware and software combinations that will be generally applicable to any computational science domain. While novel system configurations are encouraged, systems designed to target a single application or just HPC will generally not be favorably considered. The proposal should contain detailed information about both the hardware being used and the software stack that will be used to participate in the challenge. The detail should be sufficient for the judging panel to determine if all the applications will easily port to and run on the computational infrastructure being proposed. The proposal also should contain information about the demonstrations that will be used during the showcase period of the conference after the challenge is complete. In other words, explain how your team will impress the SC08 conference attendees. Furthermore, teams should describe why their team is participating and why they will win the cluster challenge. Finally, the commitment of the institution to educating the broader student community about the usefulness and the accessibility of High Performance Computing at their institution should be delineated; explain how cluster computing is integrated in the educational curriculum of the proposing institution. The proposal is limited to 4 pages.

Hardware

The computational hardware (processors, switch, storage, etc.) must fit into a single rack. All components associated with the system, and access to it, must be powered through the two 120-volt, 20-amp circuits, (each with a soft limit of 13 amps), provided by the conference. Power to each system will be provided via metered power distribution units (http://www.apc.com/resource/include/techspec_index.cfm?base_sku=AP7801)

The equipment rack must be able to physically hold these metering power strips. Electronic alarms will be sent if the power draw exceeds the soft limit, and penalties may be assessed for excess draw and/or not responding appropriately to the issue.

Other systems (such as laptops and monitors) may be powered from separate power sources provided by the conference.

The computational hardware must be commercially available and teams must display, for public view, a complete list of hardware and software used in the system. With the exception of spare components, all hardware must be present in the rack and powered at all times, even when idle. (The configuration may not be changed by physically turning equipment on and off).

Teams must provide a large visual display (LCD or projector), upon which they are to continually showcase their progress through display of the visualization output from the applications and other dynamic content the team chooses. The contest area is in the public area of the conference and the intention is to attract visitors to the contest activities.

A network drop will be provided for outgoing connections only. Offsite access to the computational equipment will not be permitted. Wireless for laptops will be available throughout the convention center via SCinet. Computational hardware may be connected via wired connections only – wireless access is not permitted.

Booths will be 12 x 12 feet and back to a 30-foot, solid wall. Teams must fit into this space for all activities and must have the display visible to the viewing public. Since thermal issues may be a factor, teams should exhaust hot air vertically from their systems. Air circulating fans external to the cluster will not be connected to the metered power.

Software

The goal of the event is to run the HPC Challenge benchmarks and scientific applications chosen to be OS-neutral and to provide real-world workloads. Points will be awarded for successful processing of data sets and displaying output on the monitors for visitors to observe.

Teams may choose any operating system and software stack that will run the contest and display software. Teams may pre-load and test the applications and other software.

At the start of the event, teams will first run the HPC Challenge benchmarks found at: http://icl.cs.utk.edu/hpcc/

The teams will submit the benchmark results prior to obtaining data for the applications. Once benchmarks have been submitted, they may not be re-run.

Teams must capture all output produced by the applications, including the command-line responses and text output to the terminal window used to launch the application (stdout and stderr in Unix terms).

The event will strive to provide more data than the teams can expect to process in the time allotted. One aspect of the contest will be determining the strategy for running the applications to maximize the team’s points. Teams may study the applications and modify them for their platforms, in advance of the event, with the restriction that only the student team members may edit the applications–no outside help!

The contest will be formed from the applications listed below, with the final list announced prior to the contest to give teams an opportunity to finalize their strategy.

Benchmarks

HPCC
http://icl.cs.utk.edu/hpcc

Since 1986, the high performance computing community has used LINPACK to rate their systems. LINPACK solves a dense system of linear equations. The historical and current listings of high-performance systems are available on the top500 website at www.top500.org. Recently, the HPCC benchmarks, which include LINPACK, have been growing in popularity. The Cluster Challenge will use this benchmark suite.

HPCC was developed to study future Petascale computing systems, and is intended to provide a realistic measurement of modern computing workloads. HPCC is made up of seven common computational kernels: STREAM, HPL, DGEMM (matrix multiply), PTRANS (parallel matrix transpose), FFT, RandomAccess, and b_eff (bandwidth/latency tests). The benchmarks attempt to measure high and low spatial and temporal locality space. The tests are scalable, and can be run on a wide range of platforms, from single processors to the largest parallel supercomputers.

The HPCC benchmarks test three particular regimes: local or single processor, embarrassingly parallel, and global, where all processors compute and exchange data with each other. STREAM measures a processor’s memory bandwidth. HPL is the LINPACK TPP (Toward Peak Performance) benchmark. RandomAccess measures the rate of random updates of memory, PTRANS measures the rate of transfer of very large arrays of data from memory, and b_eff measures the latency and bandwidth of increasingly complex communication patterns.

All of the benchmarks are run in two modes: base and optimized. The base run allows no source modifications of any of the benchmarks, but allows generally available optimized libraries to be used. The optimized benchmark allows significant changes to the source code. The optimizations can include alternative programming languages and libraries that are specifically targeted for the platform being tests.

A C compiler and an implementation of MPI are required to run the benchmark. The report Introduction to the HPC Challenge Benchmark Suite by Dongarra and Luszczek describes how HPCC was used at SC06:

http://icl.cs.utk.edu/projectsfiles/hpcc/pubs/sc06_hpcc.pdf

Applications

Open FOAM
http://www.opencfd.co.uk/openfoam/

The OpenFOAM (Open Field Operation and Manipulation) CFD Toolbox can simulate anything from complex fluid flows involving chemical reactions, turbulence and heat transfer, to solid dynamics, electromagnetics and the pricing of financial options. OpenFOAM, produced by OpenCFD Ltd, is freely available and open source, licensed under the GNU General Public License.

OpenFOAM uses finite volume numerics to solve systems of partial differential equations ascribed on any 3D unstructured mesh of polyhedral cells. The fluid flow solvers are developed within a robust, implicit, pressure-velocity, iterative solution framework, although alternative techniques are applied to other continuum mechanics solvers. Domain decomposition parallelism is fundamental to the design of OpenFOAM and integrated at a low level so that solvers can generally be developed without the need for any "parallel-specific" coding. OpenFOAM is freely available as source or Linux binaries.

WPP
https://computation.llnl.gov/casc/serpentine/
and
User's Guide to the Wave Propagation Program (WPP) version 1.1
https://computation.llnl.gov/casc/serpentine/download/WPP-UsersGuide-v1.1.pdf

WPP is a parallel computer program for simulating time-dependent elastic and viscoelastic wave propagation, with some provisions for acoustic wave propagation. WPP solves the governing equations in displacement formulation using a node-based finite difference approach on a Cartesian grid.

WPP implements substantial capabilities for 3-D seismic modeling, with a free surface condition on the top boundary, non-reflective far-field boundary conditions on the other boundaries, (many) point force and point moment tensor source terms with many time dependencies, fully 3-D material model specification, output of synthetic seismograms in the SAC format, output of GMT scripts for laying out simulation information on a map, and output of 2-D slices of (derived quantities of) the solution field as well as the material model.

The primary goal of WPP is to advance computational science by developing and analyzing numerical methods for wave propagation simulation. The partial differential equations are approximated by finite differences on Cartesian grids in second order formulation, and the geometry is handled by the embedded boundary technique. WPP is available to the public from LLNL through the link above. Registration is required.

POY
http://research.amnh.org/scicomp/projects/poy.php

POY4 is a flexible, multi-platform program for phylogenetic analysis of molecular and other data. An essential feature of POY4 is that it implements the concept of dynamic homology allowing optimization of unaligned sequences. POY4 offers flexibility for designing heuristic search strategies and implements an array of algorithms including multiple random addition sequence, swapping, tree fusing, tree drifting, and ratcheting. As output, POY4 generates a comprehensive character diagnosis, graphical representations of cladograms and their user-specified consensus, support values, and implied alignments. POY4 provides a unified approach to co-optimizing different types of data, such as morphological and molecular sequence data. In addition, POY4 can analyze entire chromosomes and genomes, taking into account large-scale genomic events (translocations, inversions, and duplications).

POY4 is freely available in source and binary format for Linux, Mac OS, and Windows.

RAxML
http://icwww.epfl.ch/~stamatak/index-Dateien/Page443.htm

RAxML (Randomized Axelerated Maximum Likelihood) is a program for sequential and parallel Maximum Likelihood based inference of large phylogenetic trees. It was originally been derived from fastDNAml which, in turn, was derived from Joe Felsentein's dnaml, part of the PHYLIP package.

In addition to the sequential version, RAxML offers two ways to exploit parallelism: fine-grained parallelism that can be exploited on shared memory machines or multi-core architectures, and coarse-grained parallelism that can be exploited on Linux clusters. The current version of RAxML is a highly optimized program, which handles DNA and AA alignments under various models of substitution and several distinct methods of rate heterogeneity. In addition, it implements a significantly improved version (run time improvement of factor 2.5) of the fast rapid hill climbing algorithm. At the same time these new heuristics yield qualitatively comparable results.

RAxML also offers a novel, unpublished rapid Bootstrapping algorithm that is faster than other current implementations by at least one order of magnitude. Once again, the results obtained by the rapid bootstrapping algorithm are qualitatively comparable to those obtained via the standard RAxML BS algorithm and, more importantly, the deviations in support values between the rapid and the standard RAxML BS algorithm are smaller than those induced by using a different search strategy, e.g., GARLI or PHYML. This rapid BS search can be combined with a rapid ML search on the original alignment and thus allows users to conduct a full ML analysis within one single program run.

RAxML is freely available in source and binary format for Mac OS and Windows.

GAMESS
http://www.msg.ameslab.gov/GAMESS

The General Atomic and Molecular Electronic Structure System (GAMESS) is a general ab initio quantum chemistry package. It can be used to compute the properties of molecules and chemical reactions using a wide variety of theoretical models. These include equilibrium geometries, vibrational frequencies, with IR or Raman intensities. The Fragment Molecular Orbital method permits use of many of these sophisticated treatments to be used on very large systems, by dividing the computation into small fragments. The code can run serially or in parallel and graphics programs.

The MacMolPlt program is available for viewing the final results. For Macintosh, Windows, or Linux desktops, see:

http://www.scl.ameslab.gov/~brett/MacMolPlt

The Ghemical program can assist with preparation of inputs. The input is a simple pseudo-formatted file and the main output is a text file as well. Graphics programs automatically parse the output to produce images.

When you go to download GAMESS you will have to agree to the user agreement and then get additional download information via email. There is no cost for GAMESS, but you have to agree to the terms of the user agreement. The download pages describe an optional program TINKER. For the SC08 Cluster Challenge, TINKER will not be required. You will want the source code distribution of GAMESS even though many binary packages are available.

   IEEE Computer Society  /  ACM     2 0   Y E A R S   -   U N L E A S H I N G   T H E   P O W E R   O F   H P C