Multithreaded programs that communicate through shared memory are pervasive. They originally provided a convenient way for an application to perform, for example, a long compute task while remaining responsive to an interactive user. Today they are the most obvious route to using multiple available processor cores for a single task, so the user can benefit from the increased number of available cores.
Unfortunately, a surprising amount of confusion surrounds the basic rules obeyed by shared memory. If a variable is updated by one thread, when will the new value become visible to another thread? What does it mean, if anything, to have two threads updating the same variable at the same time? Do all threads have to see updates in a consistent order?
The confusion surrounding these issues has resulted in many intermittent software bugs, often in low-level libraries that affect large numbers of applications. On at least one occasion, it has resulted in a pervasively used, but usually incorrect, programming idiom. (Try searching for "double-checked locking").
This problem arises at different levels. At the programming language level, there must be clear rules for the programmer's use of shared variables. Compilers and low-level libraries must enforce these rules by relying on corresponding hardware properties for memory-access instructionsthe subject of the following paper.
Most of the early work on sharedmemory semantics focused on the instruction set level and trade-offs with hardware optimization. Roughly concurrently, some older programming language designs, notably Ada 83, made very credible attempts to address the language-level issue. Unfortunately none of this prevented major problems in the most widely used languages from escaping attention until very recently. The Java specification was drastically revised around 2005,2 and has still not completely settled.1,3 Similarly, the C and C++ specifications are being revised to finally address similar issues.1 As a result, hardware architects often could not have a clear picture of the programming language semantics they needed to support, making a fully satisfactory resolution of the hardware-level issues more difficult or impossible.
The recent work on shared variables in programming languages
highlighted some remaining questions about hardware memory
models. For example, Java's
volatile requires that
updates of all such fields must become visible in the same order
to all other threads. Is it even possible to enforce that on
common hardware at reasonable cost? What instruction sequences do
compilers need to generate? Until 2007 or so, the answers to such
questions remained unclear, and sometimes unraised, on several
major architectures. X86 is probably the most visible of these
Raising these issues resulted in extended discussions
involving the machine architects. As a result of this process,
Intel and AMD have released a sequence of specifications for x86
shared-memory accesses. These are far more precise than they were
at the start of the process, and directly address many
interesting test cases that arose during the discussions.
However, they are still not a precise mathematical description
that could be used to, for example, prove that an implementation
volatile is correct.
The x86-TSO model fills that gap, by providing precise mathematical and empirically accurate models of x86 shared memory as it is visible to user programs. These include an operational model presented here in a very intuitive fashion. In the process of making the model precise and directly testing it against existing implementations, Sewell et al. expose new issues not currently addressed by the manufacturers' specifications, while also confirming their model is compatible with existing implementations.
The examples here are interesting, not just because it may be
surprising that there are tiny program fragments (often with only
four instructions) for your desktop computer whose meaning is
still open to debate, but also because these same small examples
are often at the core of important algorithms or software. The
first example is an abstraction of Dekker's mutual exclusion
algorithm first described in 1965. The same property is important
to many modern lock-free algorithms. We already mentioned the
importance of a consistent write visibility ordering for Java
volatiles. The upcoming C and C++ standards
introduce an approximate analog,
aspects of which also rely on the issues surrounding
single-variable coherence (examples n5 and n4b in the paper)
being resolved exactly as suggested by x86-TSO. The paper itself
discusses the impact of these issues on Linux spin-lock code.
X86-TSO describes the behavior of x86 memory accesses, which depends on the behavior of processors produced by multiple vendors, and chipsets produced by an even larger number. This helped to put academic researchers into the best position to write such a specification.
©2010 ACM 0001-0782/10/0700 $10.00
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee.
The Digital Library is published by the Association for Computing Machinery. Copyright © 2010 ACM, Inc.
No entries found