Entering a Parallel Universe

University of Illinois at Urbana-Champaign's Universal Parallel Computing Research Center

For years, software programmers had it easy when they wanted to write faster and more feature-laden applications. They just did it, thanks to the seemingly eternal verity of Moore’s Law, which states that the density of transistors on a chip will double every two years. The size of transistors has continued to shrink, making it possible for more transistors to be placed on a die of the same size, but clock speeds haven’t increased, due to thermal issues. Intel and other chip manufacturers have opted to use the extra transistors to build multiple processor cores on the same die, as opposed to increasing the functionality or sophistication of each individual core. In order to take advantage of these multiple cores, applications must be rewritten to accomplish their task, using multiple parallel threads of execution.

Parallel programming is not new; it has been a mainstay in high-performance computing for decades. However, the vast majority of general-purpose platforms have been designed to operate sequential applications, in which one instruction logically follows its predecessor to accomplish a given task. In parallel programming, numerous calculations are performed simultaneously, operating on the principle that large problems can often be divided into many smaller ones, which are then solved concurrently. However, generations of programmers for mainstream platforms have never had to work in parallel.

“We have come to parallel programming not because of the success of our software, but because of the failure of our hardware,” says Tim Mattson, a senior research scientist at Intel. “If the hardware can’t give it to you with a single thread anymore you have to figure out how to do parallel. So it’s kind of an urgent situation here. We have to crack this one.” (See “Face the Inevitable, Embrace Parallelism,” on p. 36.)

What to Parallelize

There are numerous “yes, but…” scenarios inherent in bringing parallel programming to mainstream machines and developers. For instance, many programming experts say the return on investment for changing sequential code for the next generation of dualcore or quad-core machines might not be large enough to make the transition worthwhile. As a result, researchers and toolmakers have a short grace period in which to examine exactly which applications are worth parallelizing.

“The consequence of having to move to multicore is that we have to figure out how to use parallelism in places where we need more performance,” says Jim Larus, director of software architecture for cloud computing futures at Microsoft. “Not evrything needs to be parallel. What’s it going to do for you to make Word run a little faster?”

Larus says some applications, such as speech recognition, for which parallel programming is seen as a requirement, might benefit more from algorithmic improvements of existing serially written applications instead of converting to parallel processing. “The people working on [speech recognition] at Microsoft say a machine that’s 10 times faster would probably reduce the error rate by a few tenths of a percent,” says Larus. “They see the future in terms of better algorithms, not more computation. We’re saying we can keep giving you exponential growth in compute power for certain types of programs, and people are telling us ‘That’s not really what we need for what we’re doing’ or ‘That’s not enough for what we’re doing.”‘

Larus’s cautions aside, the computer industry is moving en masse to multicore machines, and users will expect to receive additional performance for their money—performance that will often depend on parallel applications. Therefore, some experts say, there is a danger in not immediately starting to train programmers on the requirements of parallel programming. This consciousness raising is important at all levels, from industry veterans to undergraduate students. “Sequential programming models do not work well enough,” says Maurice Herlihy, a professor of computer science at Brown University. “We can more or less keep busy” four cores or fewer, he says, “but beyond that we’ll have to rethink some things. If you can’t deliver more value with more cores, why would anybody ever upgrade?” Herlihy sees a peril that the engine of progress that has driven computer science for decades could run out of fuel, with dire consequences. “If this were to dissipate, then all the smart students would go to bioengineering or something, and computer science as a field would suffer.” Indeed, he says, “even one generation of stagnation could do lasting damage.”

Incremental Integration

Computer scientists on university faculties say academia is debating how and when to introduce parallel programming throughout the curriculum, instead of just offering an upperlevel course as is now common. Both Brown’s Herlihy and Guy Blelloch, a professor of computer science at Carnegie Mellon University, say the early emphasis should be on higher-level parallel concepts and not on coding particulars such as languages or development frameworks.

Yet without some sort of introduction to the emerging post-collegiate parallel programming practice and tools environment, these new engineers might need even more training. Herlihy says that existing parallel frameworks—such as OpenMP, which dates to 1997, and the newly released OpenCL—are well suited for professional programmers, but not for students, who largely program in Java. This lack of grounding in the fundamentals of parallel code writing could lead to a looming disconnect between what new programmers know and what industry needs.

Intel’s Mattson, who worked on both frameworks, says one of the major blind spots of both OpenMP and Open-CL is a lack of support for managed languages such as Java. He also says the idea that there may be some type of revolutionary parallel programming language or approach on the near-term horizon that solves the multicore conundrum is misplaced. “Programmers insist on incremental approaches,” Mattson says. “They have a huge base of code written in established languages they will not throw away to adopt a whole new language, and they have to be able to incrementally evolve this legacy of code into the parallel space.”

The good news is that tools to assist programmers in this task of incrementally parallelizing code are proliferating on the market. Examples include Cilk++ from Cilk Arts, a Burlington, MA, company that extends the work of the Cilk Project at MIT. Cilk++ allows parallel programs written in C++ to retain serial semantics, which in turn permits programmers to use serial methodologies. CriticalBlue, an Edinburgh, Scotland-based company, recently released Prism, a parallel analysis and coding tool that CEO David Stewart says works with C or C++ and that allows users to explore parallelization strategies—which pieces to run in parallel, which dependencies to break, how many cores to use, and so on—before touching the code.

The most sensible way to implement parallelism, Stewart contends, is to enable software developers to analyze how much potential parallelism is in their code and to determine the minimum set of code changes required to exploit that parallelism. “Put another way,” Stewart says, “how much does Amdahl’s Law screw me up?”

Academia is debating when and how to add parallel programming to the curriculum, instead of only offering an upper-level course as is now common.

A Firm Ceiling

For years, the promise of parallel computing has run afoul of the harsh reality of an axiom posited by computer architect Gene Amdahl in 1967. Amdahl’s Law puts a firm ceiling on the benefit of converting code from sequential to parallel. It states that the speedup of an application using multiple processors in parallel computing is limited by the time needed for the sequential fraction of the program. The upshot: going down the path of parallelism will not necessarily reap rewards.

“If 50% of your program is serial and the other half can be parallelized, the biggest speedup you’re going to see is a factor of two,” says Microsoft’s Larus. “It doesn’t matter how many cores you have. And that doesn’t seem very compelling if you’re going to have to rewrite a huge amount of software.” Amdahl’s Law, Larus says, might very well mean that a wholesale rush to convert serial applications to parallel platforms in order to preserve a Moore’s Law pace of progress would be misguided. In many cases, it will be more cost-effective to improve serial applications’ performance via algorithmic advances and custom circuitry rather than going for the marginal return on investment that parallelizing those applications might provide.

For Larus, reconciling the true computational needs of future applications with the overall move to parallel-capable multicore processors must entail rigorous evaluation of what type of application might deliver the most benefit to users. Because the bulk of general-purpose computing has been done successfully on serial platforms, he says it’s been difficult to pin down exactly which applications might derive the most benefit from parallelization.

“This is a challenging problem,” he says. “A lot of people take the attitude, ‘If we build it they will come,’ and that may very well happen—there might be a killer app. But not knowing what that is makes it really hard to build the infrastructure and the tools to facilitate the app.”

So far, Larus says, the computer engineering community is basing its idea of what future general-purpose multicore platforms are capable of due to niche applications such as high-performance scientific data analysis software, where parallelization has shown its value. But there’s no guarantee it will be possible to extrapolate from this experience to create the development frameworks most programmers will need as parallelism goes mainstream in general-purpose computing.

“In this case we’re going backwards; we’re building the tools based on our experience with high-performance computing or scientific computing and saying people are going to need this. And that may be true, but it has a funny feel to me—to have the tools leading.”

Larus says the key to successful parallel platforms might be in finding a way to combine existing discrete serial platforms. One example, he says, might be a virtual receptionist that needs to process visual cues from a camera taking images of a visitor while also responding to spoken queries such as the visitor’s request for directions to a nearby restaurant. “This type of thing has been gradually building up in discrete fields over the years,” Larus says. “When you put it together there are all these big, independent pieces that only interact at these well-defined boundaries. That’s a problem that’s actually easy to parallelize.”

Figures

Figure. Students participating in a class on multicore programming at the University of Illinois at Urbana-Champaign’s Universal Parallel Computing Research Center in June 2009.

Figure. Named after computer architect Gene Amdahl, Amdahl’s Law is frequently used in parallel programming to predict the theoretical maximum speedup using multiple processors.

Footnotes

DOI: http://doi.acm.org/10.1145/1562164.1562171

What to Parallelize

Incremental Integration

A Firm Ceiling

Figures

Entering a Parallel Universe

DOI

September 2009 Issue

Join the Discussion (0)

Become a Member or Sign In to Post a Comment

The Latest from CACM

Shape the Future of Computing

Communications of the ACM (CACM) is now a fully Open Access publication.

What to Parallelize

Incremental Integration

A Firm Ceiling

Figures

Entering a Parallel Universe

DOI

September 2009 Issue

Related Reading

Join the Discussion (0)

Become a Member or Sign In to Post a Comment

Shape the Future of Computing

Communications of the ACM (CACM) is now a fully Open Access publication.