High-Performance Computing: Where

You are a young entrepreneur with a “can’t miss” idea for the next great social networking service, one so hip, so cool and so awesome that it will spread like a virus, while venture capital firms line up to beg for a piece of the action. It’s just a simple matter of programming to turn your idea into code that scales seamlessly across tens of thousands of data center servers.

You know the drill. Fire up the coffee percolator, break out your copy of Kernighan and Ritchie, grab your cheat sheet of TCP/IP functions and parameters, slam a cassette into the tape player and start programming. Oh, wait; that’s so 70s and 80s! In the hierarchy of abstractions, it’s only slightly above toggling absolute binary into the front panel of the machine.

In the web service world, we have moved beyond these low level tools. Over the past twenty years, we have built and embraced a suite of powerful libraries, scripting languages, software services and tools that allow developers to create complex software systems, while hiding the low level attributes of networks and computer systems. We focus on composition, abstraction, rapid deployment, software scaling and human productivity.

Meanwhile, in the world of high-performance computing, message passing has remained the programming paradigm of choice for over twenty years. The durable Message Passing Interface (MPI) standard, with send/receive, broadcast and reduction operators is still used to construct parallel programs composed of tens to hundreds of thousands of communicating processes. Each interprocess communication is orchestrated by the developer based on knowledge of code function and overhead.

To date, attempts to develop higher level programming abstractions, tools and environments for high-performance computing have largely failed. There are many reasons for this failure, but I believe many are rooted in our excessive focus on hardware performance measures. By definition, the raison d’être for high-performance computing is high performance, but floating point operations per second (FLOPS) need not be the only measure. Human productivity, total cost and time to solution are equally, if not more important.

I am confident that high-performance computing can and should learn a few tricks from the world of web services. We need a Ruby on Rails for defining parallel application frameworks and an Erlang for concurrent specification. We need to focus on high-productivity computing, balancing human and machine performance.

You know the drill. Pour some hot water in the French press, break out your copy of Lattice QCD for Dummies, grab your cheat sheet of high-performance computing abstractions, download some digital tunes and start programming. In all seriousness, a new world of opportunity and scientific discovery awaits those who first embrace and master the abstractions needed to create rich, multidisciplinary parallel applications.