SC Conference - Activity Details

50.5 Mflops/dollar and 8.5 Tflops Cosmological N-body Simulation on a GPU Cluster

Tsuyoshi Hamada  (Nagasaki University)
Keigo Nitadori  (University of Tokyo)
Tomonari Masada  (Nagasaki University)
Makoto Taiji  (RIKEN)
Posters Session
Tuesday,  05:15PM - 07:00PM
Room Rotunda Lobby
Many works of GPGPU were reported also in astrophysical N-body simulations, which is one of the grand-challenge problems in computational sciences. However, most of these works based on simple O(N^2) algorithm, and their performances were not higher than performances with conventional CPUs based on O(Nlog N) algorithm like the tree or Particle-Particle Particle-Mesh algorithms. Because of the difficulty in efficient implementation of these algorithms on GPUs, a GPU cluster had no practical advantage to general PC clusters for N-body simulation. In this paper, we report new parallel implementation of the tree algorithm that works with high efficiency on GPUs. Our novel tree code realized N-body simulation on a GPU cluster at higher performance than that on general PC clusters. In practice, we performed a cosmological simulation with 562 million particles on the GPU cluster using 128 GeForce8800GTS at the cost of 168,172 dollars. The sustained performance was 20.1 Tflops, which was equivalent to 8.50 Tflops on general CPU. The achieved cost/performance was 50.5 Mflops/dollar.
   IEEE Computer Society  /  ACM     2 0   Y E A R S   -   U N L E A S H I N G   T H E   P O W E R   O F   H P C