SC Conference - Activity Details

Enhancing the Performance of Dense Linear Algebra Solvers on GPU's

Marc Baboulin  (University of Coimbra)
James Demmel  (University of California, Berkeley)
Jack Dongarra  (University of Tennessee, Knoxville)
Stanimire Tomov  (University of Tennessee, Knoxville)
Vasily Volkov  (University of California, Berkeley)
Posters Session
Tuesday,  05:15PM - 07:00PM
Room Rotunda Lobby
As the peak performance for GPUs has reached 1 teraflop and support for double precision arithmetic has been added, the appeal for GPUs for general purpose HPC has become even higher. Our poster explores various ways to speed up linear algebra solvers on GPUs. The design principles are characterized by BLAS level parallelism and hybrid CPU-GPU calculations. We discuss several approaches to minimize the cost of pivoting for the LU factorization including data structure optimization and randomization techniques. We also emphasize the use of mixed precision iterative refinement technique which allows the overall performance of single precision on the GPU's to be used while still obtaining the solution to an accuracy that is inherent for double precision calculations. We provide experimental results using NVIDIA's next-generation 'G90' GPU for linear system solvers using LU, Cholesky and QR factorizations and we mention some possible use for linear least squares problems.
   IEEE Computer Society  /  ACM     2 0   Y E A R S   -   U N L E A S H I N G   T H E   P O W E R   O F   H P C