Exploiting parallelism has become the primary means to higher performance. Shared memory is a pervasively used programming model, where parallel tasks or threads communicate through a global address space. Popular languages, such as C, C++, and Java have (or will soon have) standardized support for shared-memory threads. Unfortunately, shared-memory programs are notoriously prone to subtle bugs, often due to data races.
Languages that allow data races obfuscate true communication and synchronization. Since any load or store may communicate or synchronize with another thread through a data race, it becomes difficult to reason about sections of code in isolation. Data races also create non-determinism since the ordering between two racing accesses is timing-dependent. Racy programs therefore require reasoning about many interleavings of individual memory accesses and their executions are difficult to reproduce. Data races have therefore been widely considered as symptoms of bugs and there is much research to automatically detect them.
An arguably more fundamental problem with data races concerns semantics. Every programming language must specify what value a load can return, also called the memory model. It has been surprisingly difficult to specify an acceptable model that balances ease-of-use and performance for languages that allow data races.1 Programmers usually expect a sequential interleaving-based model called sequential consistency. For data-race-free programs, it is straightforward to provide sequential consistency and high performance. With racy code, however, routine compiler and hardware optimizations can result in surprising behaviors. Consequently, the upcoming C and C++ memory models specify "undefined" behavior for programs with data races. This gives freedom to implementers but poses challenges for debugging racy code. Java’s safety requirements preclude the use of "undefined" behavior. The Java memory model, therefore, specifies very weak semantics for racy programs, but is extremely complex and currently has known bugs.
Despite the debugging and semantics difficulties, some prior work refers to certain data races (for example, some unsynchronized reads) as benign and useful for performance. Unfortunately, reasonable semantics for programs with any type of data race remain elusive; C, C++, and Java do not provide usable semantics with any (benign or not) data race.
A natural conclusion is to strive for languages that eliminate data races. Although there is much progress, such general-purpose languages,2 are not yet commercially available. The following two papers on Goldilocks and Fast-Track suggest alternative solutions that use always-on data race detection for current languages.
Goldilocks was the first work to propose a data race be treated as a language-level exception, just like null pointer dereferences. This insight cleans up the most difficult part of the memory models mess—executions are either sequentially consistent or throw an exception. No complicated semantics of Java and no unsafe behavior of C/C++!
The challenge, compared to much prior work on race detection, is that an always-on language-level exception mechanism must be both fast and precise. The well-established race detection algorithm based on vector clocks is precise, but incurs orders-of-magnitude slowdown. Faster algorithms (for example, based on a lockset approach) exist, but produce false positives, which are unacceptable for enforcing language-level semantics.
Goldilocks extends the lockset approach to make it precise, at a much faster speed than the previous precise vector clock algorithm. FastTrack followed up by optimizing the vector clock approach to make it run faster, without losing precision and performing even better than Goldilocks. For the first time, these papers made it possible to believe that "data race as an exception" may be a viable solution to the vexing debugging and semantics problems of shared memory.
The absolute slowdown of both techniques is still quite significant, but they expose an exciting research agenda. An immediate challenge is to improve performance. Another concerns mapping races detected in compiler-generated code to source code. Others are exploring hardware for similar goals.3
More broadly, these papers encourage a fundamental rethinking of programming models, languages, compilers, and hardware. Should languages be designed to eliminate data races by design?2 Or should the runtime automatically detect races? Or should the hardware?3 Can similar techniques be used for enforcing even stronger properties such as determinism and atomicity? Does this impact how we view shared memory as a programming model? The answer to all these questions is probably "yes."
It is unlikely that any of language, runtime, or hardware techniques alone will strike the best balance between ease-of-use, generality, performance, and complexity, but designing systems that combine the strengths of such techniques remains challenging. It is also unclear what high-level language properties such systems will finally ensure; for example, race elimination is certainly a critical factor in ensuring determinism and atomicity. What is clear is these papers provide key insights to shape the final solutions and are important steps toward making parallel programming amenable to the masses.