On July 1, Argonne National Laboratory dedicated Mira, its new massively parallel supercomputer. An IBM Blue Gene/Q system, the $100-million Mira consists of 48 racks of computers, 786,432 processors and 768 terabytes of memory; it is capable of 10 quadrillion calculations per second, making it the fifth-fastest supercomputer in the world. Mira is 20 times faster than its predecessor, Intrepid, which was the fourth-fastest supercomputer globally when it was installed in 2008.
During the eight months it took to install all Mira’s components and get it up and running, Argonne’s Early Science Program allocated time to 16 research projects chosen to help prepare key scientific applications for Mira’s architecture and scale. These projects "eat up a significant portion of the machine," says Michael Papka, Argonne Leadership Computing Facility director and deputy associate laboratory director, Computing, Environment and Life Sciences. Papka says in general, code for such massively parallel machines is not as efficient as possible, due to a dearth of high-performance computing expertise, particularly in parallel programming, to write new code and to port legacy code.
"Most parallel code is 10-15% efficient," Papka says. "It’s still faster and you’re doing more science than in the past, but how efficiently?" There are major challenges in moving applications to a massively parallel machine like Mira, he says, because "you need to figure out how to build code that uses all the capabilities of the machine at once; 786K things happening simultaneously, and it’s even more complicated since each core is capable of multiple threads."
Programmers of early massively parallel supercomputers faced a similar situation 30 years ago, but at a smaller scale; those machines were powered by thousands of RISC processors. In comparison, today’s supercomputers are equipped with hundreds of thousands of multi-core chips. "We don't have a great software story to go with (advances in hardware)," says Papka. "It’s easy to talk about hardware and show machines, but software is so fuzzy and people can’t get their heads around it, and that’s why we have legacy code."
In 2010, the U.S. Department of Energy (DOE) commissioned International Data Corporation (IDC) to conduct a study of talent and skill impacting high-performance computing (HPC) data centers, which found the HPC workforce aging and retiring, and 93% of HPC centers having difficulty hiring staff with the requisite skills as a result.
In March, the Networking and Information Technology Research and Development Program (NITRD), self-described as "the primary mechanism by which the Government coordinates its unclassified networking and information technology (IT) research and development (R&D) investments," published "Education and Workforce Development in the High End Computing Community" on behalf of its High End Computing Interagency Working Group. That position paper concluded: "Current approaches to HEC (high-end computing) workforce development and education are inadequate to address today’s needs in HEC centers and scientific disciplines that depend upon HEC; as demands upon HEC increase, this gap will widen." Among the paper’s recommendations are collaborative curriculum development by universities and the "federal HEC community" to establish base skills. A "complete approach," the study suggests, also would include internships, scholarships, graduate and post-doctoral fellowships, and funding for research, as well as support for professionals transitioning from other disciplines.
However, not everyone agrees about how to implement such programs, or about their scope.
DOE’s main program for training people in HPC skills is Scientific Discovery Through Advanced Computing (SciDAC), which for a decade has funded interdisciplinary teams of researchers to collaborate in high-performance computing. One of those teams, the Scientific Data Group led by Scott Klasky of Oak Ridge National Laboratory, claims improvements in performance over other parallel I/O systems through the use of its Adaptable I/O System for Big Data (ADIOS), an I/O middleware package (Klasky says ADIOS has sometimes shown "an improvement of more than 1,000 times over well-known parallel file formats").
In addition, DOE funds HPC workshops, including a two-week training course at Argonne for those already experienced in parallel programming who need guidance in transferring legacy code to new systems. Demand for the course was strong and immediate: 180 graduate students, post-docs, and assistant professors applied for 60 spots. Argonne director of science Paul Messina, who runs the course, estimates the number of "leading-edge people skilled at the multi- and mega-core level" is somewhere in the range of several hundred to the low 1,000s worldwide. Further, he says, it’s clear from applications to use Argonne’s systems, as well as from journal articles, that many in computing now know little about parallel computing. "They say they can use six cores; if that’s all they can use, they really are not doing parallel processing."
Messina says he wishes he could offer a six-week course, because attendees and their advisors believe universities are not doing enough to answer the need. "People think, naively, 'well, these are smart kids, they’ll pick up (parallel processing) on their own,'" says Messina, "That’s true, but they’re likely to have gaps in their knowledge. Those gaps are among the things we’re trying to fill."
William Harrod, director of DOE’s Advanced Scientific Computing Research (ASCR) program, says researchers recognize the problem, but blame lack of student interest and enrollment. "Today, everybody’s talking about developing apps for their cellphones," says Harrod, "That’s hot now. Before that, it was the Web, and high-performance computing, up until a couple of years ago, was a side issue and did not capture the imagination." Now that phones have two cores and multi-core chips are ubiquitous, "you can no longer ignore parallel processing. It’s mainstream. It's a question of getting students to take classes and understand it’s an area they need to explore," he says.
Says Irene Qualters, program director in the NSF Division of Advanced Cyberinfrastructure, "Computational science should be part of the curriculum offered across scientific disciplines." Qualters favors a long-term, systematic approach that funds collaboration across disciplines and embeds some computational science curriculum into the scientific disciplines.
William Gropp, director of the Parallel Computing Institute at the University of Illinois at Urbana-Champaign, suggests computer science departments that treat parallelism as a central topic are making a mistake. "The core should be built around different kinds of correctness, including performance correctness, defined as knowing that the code will perform fast enough to meet the needs of the user," he says, "That might be achieved by picking the right algorithm; the right problem formulation; the right hardware; the right realization of the algorithm as efficient code; the use of parallelism at the core, chip, node, and internode level and, most likely, as a combination of all of the above."
Gropp acknowledges that parallelism is an important and powerful tool, but "Until we train computer scientists to think scientifically about performance, it will be hard to find enough programmers, developers, and innovators to meet the growing need for greater computing power, especially since we're not getting more performance from individual cores any more."
Meanwhile, the Chinese are making a big push, not just in education and workforce development, but in chips and software for HPC. In response, President Obama requested for $466 million for DOE’s ASCR program for fiscal year 2014, and the House and Senate have responded with proposals for $432 million and $495 million respectively. "Until we know where Congress stands on this, we can’t move forward," says ASCR program officer Christine Chalk.
Legacy codes that don’t get ported to new, massively parallel machines "very much impact the science that can be done," Chalk says. "What these codes all have in common is that they model very complex systems. Modeling a fusion reactor takes a lot more computing power than we can give, and that’s equally true of weapons codes."
In view of the need for Big Data analytics, Harrod adds, "that has massive computational data sitting behind it and you would think that it would be an opportunity to capture some excitement."
Karen A. Frenkel writes about science and technology and lives in New York City.