Sign In

Communications of the ACM

ACM TechNews

Bug Repellent For Supercomputers Proves Effective


View as: Print Mobile App Share: Send by email Share on reddit Share on StumbleUpon Share on Hacker News Share on Tweeter Share on Facebook
bug and hardware

Credit: iStockPhoto.com

Lawrence Livermore National Laboratory (LLNL) researchers have developed the Stack Trace Analysis Tool (STAT), a highly scalable, lightweight tool that has been used to debug a program running more than one million MPI processors on the IBM Blue Gene/Q-based Sequoia supercomputer. The debugging tool is part of a multi-year collaboration between LLNL, the University of Wisconsin, Madison, and the University of New Mexico.

The researchers say STAT has helped early access users and system integrators quickly isolate a wide range of errors, including complicated issues that only appeared at extremely large scales. "STAT has been indispensable in this capacity, helping the multi-disciplined integration team keep pace with the aggressive system scale-up schedule," says LLNL's Greg Lee.

During testing, STAT was able to identify one particular rank process that was consistently stuck in a system call out of more than one million MPI processes, according to LLNL's Dong Ahn.

"It is critical that our development teams have a comprehensive parallel debugging tool set as they iron out the inevitable issues that come up with running on a new system like Sequoia," says LLNL's Kim Cupps.

From Lawrence Livermore National Laboratory
View Full Article

 

Abstracts Copyright © 2012 Information Inc., Bethesda, Maryland, USA


 

No entries found

Sign In for Full Access
» Forgot Password? » Create an ACM Web Account