Sign In

Communications of the ACM

ACM TechNews

Bug Repellent For Supercomputers Proves Effective

View as: Print Mobile App Share: Send by email Share on reddit Share on StumbleUpon Share on Hacker News Share on Tweeter Share on Facebook
bug and hardware


Lawrence Livermore National Laboratory (LLNL) researchers have developed the Stack Trace Analysis Tool (STAT), a highly scalable, lightweight tool that has been used to debug a program running more than one million MPI processors on the IBM Blue Gene/Q-based Sequoia supercomputer. The debugging tool is part of a multi-year collaboration between LLNL, the University of Wisconsin, Madison, and the University of New Mexico.

The researchers say STAT has helped early access users and system integrators quickly isolate a wide range of errors, including complicated issues that only appeared at extremely large scales. "STAT has been indispensable in this capacity, helping the multi-disciplined integration team keep pace with the aggressive system scale-up schedule," says LLNL's Greg Lee.

During testing, STAT was able to identify one particular rank process that was consistently stuck in a system call out of more than one million MPI processes, according to LLNL's Dong Ahn.

"It is critical that our development teams have a comprehensive parallel debugging tool set as they iron out the inevitable issues that come up with running on a new system like Sequoia," says LLNL's Kim Cupps.

From Lawrence Livermore National Laboratory
View Full Article


Abstracts Copyright © 2012 Information Inc., Bethesda, Maryland, USA


No entries found

Sign In for Full Access
» Forgot Password? » Create an ACM Web Account