Looking for some new insight into an old problem? The following research paper by Milind Kulkarni et al. addresses the familiar problem of writing parallel applications and uses a fresh approach based on data abstraction to allow some challenging programs to be parallelized. Going back more than 30 years to the foundational work by Owicki and Gries on the semantics of parallel programs, and through decades of work on automatic parallelization of Fortran and C programs, the focus in the research and commercial communities has been on reasoning at the concrete level of read and write operations: if two iterations of a loop access the same variable, and at least one of them performs a write, then the iterations cannot execute in parallel. When combined with a static compile-time approach, the necessarily conservative analysis techniques mean that many loops cannot be parallelized, especially for programs with pointerbased data structures.
With a computer industry betting on multicore, the need for solutions to the parallel software problem has reached a new level of criticality. There has been a resurgence of parallelism research, much of it focused on dynamic discovery of parallelism using speculative techniques. The authors build on the idea of dynamic parallelism discovery by combining loop constructs with conflict analysis and a rollback mechanism to support speculative parallelism. The Galois system described in the paper has both ordered and unordered loops, but presents users with a serial semantics in both cases. Galois uses the type system to further control the behavior of such loops. Objects that use a traditional model of speculative parallelism are instances of so-called "catch-and-keep" classes, because the runtime system holds a lock throughout an iteration to ensure that the iterations appear to execute serially.
But an interesting class of algorithms allow for correct behavior even in the presence of conflicting accesses. For example, branch and bound search can proceed in any order but must update the value of a variable representing the current bound. The authors add to speculative execution: programmers are allowed to specify that two method invocations on an object commute, meaning they can safely be reordered. In the branch and bound example, if a class is defined for the bound, with only operations to read the bound and update it monotonically by performing a max operation with a newly discovered bound, then the update operations commute with one another. Moreover, methods that have no effect on the abstract value, but "benevolent side effects" on the underlying state will commute at the abstract level despite having conflicting accesses.
Classes that have commutativity specifications are called "catch-and-release" classes, because the implementation holds a lock only during a method invocation to ensure atomicity of the operation, but not throughout the entire iteration. Serialization of the iterations is ensured instead through commutativity of the methods and, if abstract conflicts are discovered, though rollbacks.
The commutativity specification combined with unordered loops gives a programming model that is somewhere between explicit parallelism and sequential programming, as the programmer may give many opportunities for reordering operations, but the program still has a serial semantics. The one final concept needed to parallelize some complex irregular applications is the idea that methods with side effects on catch-and-release classes must have inverse methods to "undo" the effects in case an abstract-level conflict is discovered. This allows iterations to execute in parallel, possibly with conflicting reads and writes to variables, as long as the methods involved are known to commute. If an abstract conflict is discovered between two method invocations that are not commutative, the runtime layer will roll back one of the iterations using the inverse of the operations that had been executed.
The authors successfully apply these ideas to two complex parallelization problems—an unstructured mesh refinement algorithm and a clustering algorithm used in data mining. Both problems involve irregular control flow and pointer-based data structures, making them intractable for static parallelism approaches or even speculative parallelism based on concrete conflict detection.
This paper presents the general ideas using these two compelling examples, and the concepts are both original and thought provoking. Abstraction mechanisms like object orientation are often considered deterrents to performance and parallelization, but in this approach data abstraction provides the specification point for commutativity and therefore a key to parallelization. The authors raise a number of interesting semantic issues in the examples and overall approach, and for those interested in a new perspective on parallel programming, I highly recommend this paper.
Join the Discussion (0)
Become a Member or Sign In to Post a Comment