News
Architecture and Hardware

How the Incite Project Allots Supercomputer Hours

Posted
One INCITE-approved project will utilize 112.2 million core hours to uncover the physics of earthquake processes.
In November 2013, the INCITE program granted awards of supercomputer time to 59 projects that will share nearly 6 billion core hours on two of America’s fastest supercomputers.

Every year, the U.S. Department of Energy’s Innovative and Novel Computational Impact on Theory and Experiment (INCITE) program solicits proposals from researchers seeking time on supercomputers at the two national Leadership Computing Facilities (LCF): the Oak Ridge National Laboratory (ORNL) in Tennessee, and the Argonne National Laboratory (ANL) near Chicago.

“They are high-performance computing centers with two of the top computers in the world,” says INCITE program manager Julia White. The ORNL’s 27-petaflops Cray XK7 “Titan” and the ANL’s 10-petaflops IBM Blue Gene/Q “Mira” were ranked in November 2013 as the second- and fifth-fastest supercomputers in the world, respectively, by the TOP500 Project.

The INCITE program accounts for 60 percent or more of the time available 0n the two supercomputers.

Open to researchers around the world, the program receives applications across a wide variety of domains, such as computer science, accelerator physics, chemical sciences, and life sciences, with no set minimum of hours earmarked for any particular domain. According to the 2014 Call for Proposals, “campaigns chosen by the INCITE program typically cannot be performed anywhere else and require extremely large high-performance computing systems, large awards of time, very large memory, or other unique LCF architectural infrastructure in order to succeed.”

The program receives around 140 proposals per year, says White. For 2014, the total requested time was about 14 billion core-hours. “Individuals may submit requests for one, two, or three years. We may grant one, two, or three years. We don’t always give them what they’re asking for, for a variety of reasons. If the case hasn’t been made for a multiyear award but we still find the science compelling, we may grant a one-year award.”

After the proposals are received, they are sent for peer review. “My baseline is to have three peer review experts look at each proposal,” says White; “sometimes more, rarely fewer.” The peer reviewers look primarily at the proposed research’s potential impact, and how innovative and far-reaching it is. Examples of past projects the INCITE program touts as successes include a modeling of the molecular basis of Parkinson’s disease—ranked the top breakthrough in 2008 by a panel of computational scientists, applied mathematicians, and computer scientists—and a simulation of the effects of a magnitude 8.0 earthquake over 125 square miles.

The proposal instructions give a good idea of what is required of applicants: “Explain what advances you expect to be enabled by an INCITE award that justifies an allocation of petascale resources (e.g., anticipated impact on community paradigms, valuable insights into or solving a long-standing challenge, etc.). Place the proposed research in the context of competing work in your discipline or business.” The instructions also ask applicants to describe the personnel already in place and to be hired, to outline time and personnel management plans, and to list any publications that may have resulted from previous INCITE awards to the project team for related work.

After the peer review process, two experts from the LCFs—one from each facility—examine each proposal. Their job is to draw on their experience with their systems and evaluate how efficient an applicant’s code is, and whether it is ready to run at the scale of a supercomputer. Once those judgments are in, the final selection process begins.

“We’re not looking at ‘these are all good, therefore they’re all worthy of awards,'” says White. “We’re just looking at the top of the top. If something ranks very highly from an impact perspective—that is, the engineering or the computer science is very exciting—and it’s ready to run, we make the award.”

Even limiting themselves to the top of the top, White says, leaves them short on available time. “If we were to grant all the time requested, we would be oversubscribed.” That’s why the Awards Committee, which includes LCF directors, senior management, and White, sometimes will decide to grant time to a proposal, but less than was requested. “We attempt as best as possible to meet the requests, but in cases where through recommendations of reviewers or where we feel the amount of time could be adjusted and all or most of the goals achieved, we will make modifications.”

If, for example, the facility experts decided that if the code were more efficient, the researchers could achieve their goals in less time than they asked for, the committee could reduce the number of hours granted accordingly. “If it’s exciting stuff but the applicants haven’t really shown us that it’s ready to go on the machine, we may make an allocation with the idea that we’ll provide some additional support.”

Another possibility is that the peer reviewers may advise that some parts of the proposal were more compelling than others; again, the hours allotted could be adjusted accordingly.

In other cases, the committee might simply suggest the applicant try again another year.

Not considered by the committee is whether the proposed research fits in with research that has been done at these facilities before (with the exception of applications for renewed awards), or whether it falls within governmental or Department of Energy priorities. “We have no metrics or set number of project or hours we are targeting in particular areas,” says White.

For 2014, this process resulted in awards of time to 36 percent of new submittals and 91 percent of renewals. In November 2013, INCITE announced awards of 5.78 billion processor hours, in amounts ranging from 16 to 340 million hours; the average award was almost 98 million hours.

Asked about specific awards that have been granted, White mentions the 229 million hours granted to a team led by principal investigator C. S. Chang of the Princeton Plasma Physics Laboratory. In a project titled “High-fidelity Simulation of Tokamak Edge Plasma Transport,” White explains, Chang’s team is shedding light on a long-known and little-understood phenomenon known as ‘blobby’ turbulence, in which formations of strong density clumps flow together and move around large amounts of plasma in fusion reactors, greatly affecting performance. Chang’s results could advance the effort to develop a fusion reactor, with its promise of virtually limitless clean energy.

Logan Kugler is a freelance technology writer based in Silicon Valley. He has written for over 60 major publications.

Join the Discussion (0)

Become a Member or Sign In to Post a Comment

The Latest from CACM

Shape the Future of Computing

ACM encourages its members to take a direct hand in shaping the future of the association. There are more ways than ever to get involved.

Get Involved

Communications of the ACM (CACM) is now a fully Open Access publication.

By opening CACM to the world, we hope to increase engagement among the broader computer science community and encourage non-members to discover the rich resources ACM has to offer.

Learn More