 |
 |
|
SC Conference - Activity Details
Lessons Learned at 208K: Toward Debugging Millions of Cores
Authors:
|
Gregory L. Lee
(Lawrence Livermore National Laboratory)
|
|
Dong H. Ahn
(Lawrence Livermore National Laboratory)
|
|
Dorian C. Arnold
(University of Wisconsin-Madison)
|
|
Bronis R. de Supinski
(Lawrence Livermore National Laboratory)
|
|
Matthew Legendre
(University of Wisconsin-Madison)
|
|
Barton P. Miller
(University of Wisconsin-Madison)
|
|
Martin Schulz
(Lawrence Livermore National Laboratory)
|
|
Ben Liblit
(University of Wisconsin-Madison)
|
Papers Session
|
Large-Scale System Performance
|
|
Tuesday, 04:30PM - 05:00PM
|
|
Room Ballroom E
|
Abstract:
In this paper, we present challenges to petascale tool development, using the Stack Trace Analysis Tool (STAT) as a case study. STAT is a lightweight tool that gathers and merges stack traces from a parallel application to identify process equivalence classes. We use results gathered at thousands of tasks on an Infiniband cluster and results up to 208K processes on BlueGene/L to identify current scalability issues as well as challenges that will be faced at the petascale. We then present solutions to these challenges that have been implemented and show the resulting performance improvements. We also discuss future plans to meet the debugging demands of petascale machines.
|
|
|