General Agreement

2013 ACM A.M. Turing Award recipient Leslie Lamport

"What comes first?" is critically important when computer processors perform operations that rely on each other’s outputs. Leslie Lamport realized this when reading a 1975 paper by Paul Johnson and Robert Thomas proposing timestamps to track the sequence of events in a distributed algorithm.

"I immediately related that to special relativity," Einstein’s theory of space-time, says Lamport, principal researcher at Microsoft Research and recipient of the 2013 ACM A.M. Turing Award. "The notion of ‘at the same time’ is relative. It’s not an invariant notion. But what is invariant is the notion of causality." Lamport will receive the $250,000 Turing prize for his work in bringing order to the chaos of distributed computing systems that rely on communication between many autonomous computers.

According to Einstein, two events can only be causally related if light from the first event can reach the other before the second event happens. Lamport says the same notion can be applied to messages in a computer system; the output of one processor has to reach a second processor before the second one performs an operation, if the first is to influence the second. To make that happen, you need to define "before."

That is what Lamport did with the notion of logical clocks, which count a sequence of clicks and assign a clock value to each event. For one event to occur before another, it must have a lower clock value. Once the sequence of events is determined, it becomes possible to build an arbitrary state machine. "Then you can implement any type of system you want," Lamport says.

Logical clocks work fine when nothing fails, Lamport says; the problem is how to get such a system to work in the presence of faults. From 1977 to 1985, Lamport worked for research institute SRI International, which had been contracted by the U.S. National Aeronautics and Space Administration (NASA) to build a prototype of a highly reliable system for controlling aircraft. It was assumed a three-processor system could tolerate one faulty processor; if two processors took input from a sensor and calculated similar results, but a third came up with a wildly different value, the two similar results could be considered correct and the third discarded.

The problem, laid out in a paper by Lamport and his SRI colleagues Robert Shostak and Marshall Pease, is the faulty processor might send different values to each of the other two; both processor A and processor B might think processor C was in agreement with it and the other processor was faulty. The group realized another processor was needed; in fact, to tolerate a given number n of faults required three times as many processors as that number plus one more, or 3n + 1. The paper won the 2005 Edsger W. Dijkstra Prize in Distributed Computing.

Lamport credits his colleagues with that solution; he says his contribution was a different algorithm using digital signatures so the processors could identify which one gave inconsistent results, so the problem could be solved with 2n + 1 processors. At the time, cryptographic signatures were too expensive to implement, so his solution was not used.

He takes credit for getting the paper written. "I wrote the first draft, that Shostak thought was so terrible he completely rewrote it," Lamport says.

"The notion of ‘at the same time’ is relative. It’s not an invariant notion. But what is invariant is the notion of causality."

Shostak says Lamport’s most important contribution was naming the problem when, in another paper, he described the processors as Byzantine generals surrounding an enemy encampment, deciding whether to attack. The faulty processor was described as a traitorous general, telling one colleague they should attack and the other they should not. The solution is now called Byzantine agreement.

"That made a big difference in terms of the propagation of our work," says Shostak. Lamport "has a knack, I think, for popularizing results in computing science, in addition to being somebody who has a large corpus of original work."

Lamport says that approach failed with Paxos, an algorithm for a fault-tolerant state machine handling non-Byzantine failures. He illustrated it with the story of a parliament on the Greek island Paxos reaching agreement. Though the algorithm is useful, Lamport says, the story did not help promote it.

Cornell University computer scientist Fred Schneider says Lamport’s strength lies in how he takes a "first principles" approach to computer science problems, laying bare underlying assumptions others might not have noticed. "He doesn’t accept common dogma or common descriptions of problems," Schneider says.

Lamport replaced the older definition of correctness, regarding whether a program terminated correctly, with the notions of safety—that nothing bad happens—and liveness—that something good happens. He also developed LaTeX, a document preparation system standard for publishing documents in some scientific fields. "Every time he put his finger down, he picked up a gem," Schneider says.

Lamport’s advice for those who want to become systems designers: learn to think better. "The way you learn to think better is to think more mathematically," he says, "so they should get themselves a good mathematical education."

General Agreement

DOI

June 2014 Issue

Join the Discussion (0)

Become a Member or Sign In to Post a Comment

The Latest from CACM

Shape the Future of Computing

Communications of the ACM (CACM) is now a fully Open Access publication.

General Agreement

DOI

June 2014 Issue

Related Reading

Join the Discussion (0)

Become a Member or Sign In to Post a Comment

Shape the Future of Computing

Communications of the ACM (CACM) is now a fully Open Access publication.