Student Contribution

SC Conference - Activity Details

Massively Parallel Genomic Sequence Search on the Blue Gene/P Architecture

Heshan Lin  (North Carolina State University)
Pavan Balaji  (Argonne National Laboratory)
Ruth Poole  (IBM Corporation)
Carlos Sosa  (IBM Corporation)
Xiaosong Ma  (North Carolina State University)
Wu-chun Feng  (Virginia Tech)
Papers Session
Biomedical Informatics
Wednesday,  03:30PM - 04:00PM
Room Ballroom E
This paper presents our first experiences in mapping and optimizing genomic sequence search onto the massively parallel IBM Blue Gene/P (BG/P) platform. Specifically, we performed our work on mpiBLAST, a parallel sequence-search code that has been optimized on numerous supercomputing environments. In doing so, we identify several critical performance issues. Consequently, we propose and study different approaches for mapping sequence-search and parallel I/O tasks on such massively parallel architectures.We demonstrate that our optimizations can deliver nearly linear scaling (93% efficiency) on up to 32,768 cores of BG/P. In addition, we show that such scalability enables us to complete a large-scale bioinformatics problem — sequence searching a microbial genome database against itself to support the discovery of missing genes in genomes — in only a few hours on BG/P. Previously, this problem was viewed as computationally intractable in practice.
The full paper can be found in the IEEE Xplore Digital Library and ACM Digital Library
   IEEE Computer Society  /  ACM     2 0   Y E A R S   -   U N L E A S H I N G   T H E   P O W E R   O F   H P C