Deconstructing the Bakery to Build a Distributed State Machine

A rigorous journey from the bakery algorithm to a distributed state machine.

Skip to content
# Leslie Lamport

###
Author Archives

###
Shape the Future of Computing

Deconstructing the Bakery to Build a Distributed State Machine

A rigorous journey from the bakery algorithm to a distributed state machine.

Turing Lecture: The Computer Science of Concurrency: The Early Years

Leslie Lamport is the recipient of the 2013 ACM A.M. Turing Award.

Who Builds a House Without Drawing Blueprints?

Finding a better solution by thinking about the problem and its solution, rather than just thinking about the code.

The mutual exclusion problem has been solved

“A common assumption underlying mutual exclusion algorithms in shared memory systems is that: b. A memory reference to an individual word is mutually exclusive.”

A simple approach to specifying concurrent systems

Over the past few years, I have developed an approach to the formal specification of concurrent systems that I now call the transition axiom method. The basic formalism has already been described in [12] and [1], but the formal details tend to obscure the important concepts. Here, I attempt to explain these concepts without discussing the details of the underlying formalism.
Concurrent systems are not easy to specify. Even a simple system can be subtle, and it is often hard to find the appropriate abstractions that make it understandable. Specifying a complex system is a formidable engineering task. We can understand complex structures only if they are composed of simple parts, so a method for specifying complex systems must have a simple conceptual basis. I will try to demonstrate that the transition axiom method provides such a basis. However, I will not address the engineering problems associated with specifying real systems. Instead, the concepts will be illustrated with a series of toy examples that are not meant to be taken seriously as real specifications.
Are you proposing a specification language? No. The transition axiom method provides a conceptual and logical foundation for writing formal specifications; it is not a specification language. The method determines what a specification must say; a language determines in detail how it is said.
What do you mean by a formal specification? I find it helpful to view a specification as a contract between the user of a system and its implementer. The contract should tell the user everything he must know to use the system, and it should tell the implementer everything he must know about the system to implement it. In principle, once this contract has been agreed upon, the user and the implementer have no need for further communication. (This view describes the function of the specification; it is not meant as a paradigm for how systems should be built.)
For a specification to be formal, the question of whether an implementation satisfies the specification must be reducible to the question of whether an assertion is provable in some mathematical system. To demonstrate that he has met the terms of the contract, the implementer should resort to logic rather than contract law. This does not mean that an implementation must be accompanied by a mathematical proof. It does mean that it should be possible, in principle though not necessarily in practice, to provide such a proof for a correct implementation. The existence of a formal basis for the specification method is the only way I know to guarantee that specifications are unambiguous.
Ultimately, the systems we specify are physical objects, and mathematics cannot prove physical properties. We can prove properties only of a mathematical model of the system; whether or not the system correctly implements the model must remain a question of law and not of mathematics.
Just what is a system? By "system," I mean anything that interacts with its environment in a discrete (digital) fashion across a well-defined boundary. An airline reservation system is such a system, where the boundary might be drawn between the agents using the system, who are part of the environment, and the terminals, which are part of the system. A Pascal procedure is a system whose environment is the rest of the program, with which it interacts by responding to procedure calls and accessing global variables. Thus, the system being specified may be just one component of a larger system.
A real system has many properties, ranging from its response time to the color of the cabinet. No formal method can specify all of these properties. Which ones can be specified with the transition axiom method? The transition axiom method specifies the behavior of a System—that is, the sequence of observable actions it performs when interacting with the environment. More precisely, it specifies two classes of behavioral properties: safety and liveness properties. Safety properties assert what the system is allowed to do, or equivalently, what it may not do. Partial correctness is an example of a safety property, asserting that a program may not generate an incorrect answer. Liveness properties assert what the system must do. Termination is an example of a liveness property, asserting that a program must eventually generate an answer. (Alpern and Schneider [2] have formally defined these two classes of properties.) In the transition axiom method, safety and liveness properties are specified separately.
There are important behavioral properties that cannot be specified by the transition axiom method; these include average response time and probability of failure. A transition axiom specification can provide a formal model with which to analyze such properties,1 but it cannot formally specify them.
There are also important nonbehavioral properties of systems that one might want to specify, such as storage requirements and the color of the cabinet. These lie completely outside the realm of the method.
Why specify safety and liveness properties separately? There is a single formalism that underlies a transition axiom specification, so there is no formal separation between the specification of safety and liveness properties. However, experience indicates that different methods are used to reason about the two kinds of properties and it is convenient in practice to separate them. I consider the ability to decompose a specification into liveness and safety properties to be one of the advantages of the method. (One must prove safety properties in order to verify liveness properties, but this is a process of decomposing the proof into smaller lemmas.)
Can the method specify real-time behavior? Worst-case behavior can be specified, since the requirement that the system must respond within a certain length of time can be expressed as a safety property—namely, that the clock is not allowed to reach a certain value without the system having responded. Average response time cannot be expressed as a safety or liveness property.
The transition axiom method can assert that some action either must occur (liveness) or must not occur (safety). Can it also assert that it is possible for the action to occur? No. A specification serves as a contractual constraint on the behavior of the system. An assertion that the system may or may not do something provides no constraint and therefore serves no function as part of the formal specification. Specification methods that include such assertions generally use them as poor substitutes for liveness properties. Some methods cannot specify that a certain input must result in a certain response, specifying instead that it is possible for the input to be followed by the response. Every specification I have encountered that used such assertions was improved by replacing the possibility assertions with liveness properties that more accurately expressed the system's informal requirements.
Imprecise wording can make it appear that a specification contains a possibility assertion when it really doesn't. For example, one sometimes states that it must be possible for a transmission line to lose messages. However, the specification does not require that the loss of messages be possible, since this would prohibit an implementation that guaranteed no messages were lost. The specification might require that something happens (a liveness property) or doesn't happen (a safety property) despite the loss of messages. Or, the statement that messages may be lost might simply be a comment about the specification, observing that it does not require that all messages be delivered, and not part of the actual specification.
If a safety property asserts that some action cannot happen, doesn't its negation assert that the action is possible? In a formal system, one must distinguish the logical formula A from the assertion ⊢ A, which means that A is provable in the logic; ⊢ A is not a formula of the logic itself. In the logic underlying the transition axiom method, if A represents a safety property asserting that some action is impossible, then the negation of A, which is the formula ┐A, asserts that the action must occur. The action's possibility is expressed by the negation of ⊢ A, which is a metaformula and not a formula within the logic. See [10] for more details.

Password authentication with insecure communication

A method of user password authentication is described which is secure even if an intruder can read the system's data, and can tamper with or eavesdrop on the communication between the user and the system. The method assumes a secure one-way encryption function and can be implemented with a microcomputer in the user's terminal.

On-the-fly garbage collection: an exercise in cooperation

As an example of cooperation between sequential processes with very little mutual interference despite frequent manipulations of a large shared data space, a technique is developed which allows nearly all of the activity needed for garbage detection and collection to be performed by an additional processor operating concurrently with the processor devoted to the computation proper. Exclusion and synchronization constraints have been kept as weak as could be achieved; the severe complexities engendered by doing so are illustrated.

Time, clocks, and the ordering of events in a distributed system

The concept of one event happening before another in a distributed system is examined, and is shown to define a partial ordering of the events. A distributed algorithm is given for synchronizing a system of logical clocks which can be used to totally order the events. The use of the total ordering is illustrated with a method for solving synchronization problems. The algorithm is then specialized for synchronizing physical clocks, and a bound is derived on how far out of synchrony the clocks can become.

Concurrent reading and writing

The problem of sharing data among asynchronous processes is considered. It is assumed that only one process at a time can modify the data, but concurrent reading and writing is permitted. Two general theorems are proved, and some algorithms are presented to illustrate their use. These include a solution to the general problem in which a read is repeated if it might have obtained an incorrect result, and two techniques for transmitting messages between processes. These solutions do not assume any synchronizing mechanism other than data which can be written by one process and read by other processes.

ACM encourages its members to take a direct hand in shaping the future of the association. There are more ways than ever to get involved.

Get Involved