Student Contribution

SC Conference - Activity Details

Stencil Computation Optimization and Autotuning on State-of-the-art Multicore Architectures

Kaushik Datta  (University of California, Berkeley)
Mark Murphy  (University of California, Berkeley)
Vasily Volkov  (University of California, Berkeley)
Samuel Williams  (University of California, Berkeley)
Jonathan Carter  (Lawrence Berkeley National Laboratory)
Leonid Oliker  (Lawrence Berkeley National Laboratory)
David Patterson  (University of California, Berkeley)
John Shalf  (Lawrence Berkeley National Laboratory)
Katherine Yelick  (University of California, Berkeley)
Papers Session
HPC Systems
Tuesday,  11:00AM - 11:30AM
Room Ballroom F
Understanding the most efficient design and utilization of emerging multicore systems is one of the most challenging questions faced by the mainstream and scientific computing industries in several decades. Our work explores multicore stencil (nearest-neighbor) computations --- a class of algorithms at the heart of many structured grid codes, including PDE solvers. We develop a number of effective optimization strategies, and build an autotuning environment that searches over our optimizations and their parameters to minimize runtime, while maximizing performance portability. To evaluate the effectiveness of these strategies we explore the broadest set of multicore architectures in the current HPC literature, including the Intel Clovertown, AMD Barcelona, Sun Victoria Falls, STI Cell, and NVIDIA GTX280. Overall, our autotuning optimization methodology results in the fastest multicore stencil performance to date. Finally, we present several key insights into the architectural trade-offs of emerging multicore designs and their implications on scientific algorithm development.
The full paper can be found in the IEEE Xplore Digital Library and ACM Digital Library
   IEEE Computer Society  /  ACM     2 0   Y E A R S   -   U N L E A S H I N G   T H E   P O W E R   O F   H P C