The long tradition of building ever-faster processors is ending, with the computer industry instead putting more processing "cores" on each processor chip. Therefore, to continue to benefit from advances in technology, applications must increasingly be able to perform useful work on multiple cores at the same time.
To use multiple cores concurrently, programmers must not only identify independent pieces of a task that can be executed in parallel, but also coordinate their execution, managing communication and synchronization between them. A program with a communication or synchronization bottleneck will be unable to take full advantage of the available cores. Scalable programs that avoid such bottlenecks are surprisingly difficult to construct.
The complex interactions between concurrently executing pieces of code are often managed by the careful use of locks to ensure a critical section of code on one core executes "atomically," that is, without interference from other cores. However, it is challenging to avoid synchronization bottlenecks with this approach, and in many cases the overhead of such bottlenecks negates the advantage gained by running pieces of the application concurrently.
Research in Transactional Memory (TM) has allowed programmers to express that a critical section of code should be executed atomically, without specifying how this should be achieved. The term "Transactional Memory" originated with seminal work by Her-lihy and Moss (ISCA ’93), in which they proposed to apply the idea of "transactions" (already popular in database systems) to computer memory.
Numerous variations on this theme have emerged in the literature, with some proposing TM be supported entirely in hardware, others entirely in software, and yet others using some combination of the two. Some propose extensions to programming languages, implemented with a combination of compiler support and runtime libraries, while others propose library interfaces, and yet others propose instruction set extensions that are programmed directly.
TM is generally "optimistic." Rather than pessimistically preventing critical sections from executing concurrently just in case they interfere with others, as locks do, TM allows them to execute concurrently by default. The TM system monitors concurrent critical sections to detect any "conflict" between them. When conflicts do arise, the TM system forces one or more of the critical sections to either wait or roll back, so that no interference occurs. This is done automatically by the TM system.
To date, most research has focused on using TM in user programs, and emerging TM proposals have mostly been evaluated using rather artificial toy applications and benchmarks designed to explore the various trade-offs that arise in TM design.
In contrast, the research described here by Rossbach et al. explores the use of TM in an operating system. Such research is important for a number of reasons. First, its provides significant value to the TM research community by creating TM-based programs (operating systems) that are larger, more complex, and more realistic than the mostly artificial workloads used for evaluating TM systems so far. Second, TM has the potential to improve the performance and scalability of the operating system itself, even while making it simpler. Because all user programs depend on the operating system, its scalability and correctness is critical to the success of multicore systems.
TM research for user programs will not necessarily result in the best TM support for operating systems. On the one hand, the operating system faces some challenges that user programs do not. In particular, the operating system must perform reliably while running a variety of user programs simultaneously and thus cannot exploit knowledge of specific ones to improve performance and scalability. Furthermore, operating systems employ a wider variety of synchronization mechanisms than typical user programs do, and must also deal with a variety of complicated issues on behalf of user programs, such as access to low-level system devices.
However, the operating system can exploit its privileged role in the system in ways that user programs cannot. For example, for security reasons, a user program cannot prevent the operating system from interrupting it, while the operating system has full control of the cores and can thus prevent interruption while finishing an important task.
The authors built two operating systems that exploit TM. They first attempted to use transactions directly instead of existing synchronization mechanisms. This approach entailed a number of challenges, particularly related to the fact that locks are used for purposes other than preventing memory conflicts between critical sections, such as protecting system devices for input and output (I/O). TM systems are not yet mature enough to support such functionality, and the lessons learned here are valuable for TM researchers.
The authors then took a more pragmatic approach: they retained existing synchronization mechanisms, and used TM to attempt to improve their performance and scalability. This way, they could use the existing locks when necessary—for example, for I/O—while also exploiting TM in other cases to allow parallelism that would previously have been impossible. Given the thousands of person-years spent carefully engineering modern operating systems, and the limited nature of the first TM systems, this is likely to be the most practical approach to exploiting TM in operating systems for some time.
By exploring both approaches, the authors not only contribute to our understanding of how TM can be used in large and realistic code bases, but also provide valuable insight into what additional TM features may be useful or necessary for optimal support of such systems in the longer term.
Join the Discussion (0)
Become a Member or Sign In to Post a Comment