Why Logical Clocks Are Easy

Sometimes all you need is the right language.

Posted Apr 1 2016

Introduction
Happened-Before Relation
Causal Histories
Causality Tracking
Final Remarks
Acknowledgments
References
Authors
Figures

Why Logical Clocks Are Easy, illustration

Any computing system can be described as executing sequences of actions, with an action being any relevant change in the state of the system. For example, reading a file to memory, modifying the contents of the file in memory, or writing the new contents to the file are relevant actions for a text editor. In a distributed system, actions execute in multiple locations; in this context, actions are often called events. Examples of events in distributed systems include sending or receiving messages, or changing some state in a node. Not all events are related, but some events can cause and influence how other, later events occur. For example, a reply to a received email message is influenced by that message, and maybe by prior messages received.

Events in a distributed system can occur in a close location, with different processes running in the same machine, for example; or at nodes inside a datacenter; or geographically spread across the globe; or even at a larger scale in the near future. The relations of potential cause and effect between events are fundamental to the design of distributed algorithms. These days hardly any service can claim not to have some form of distributed algorithm at its core.

To make sense of these cause-and-effect relations, it is necessary to limit their scope to what can be perceived inside the distributed system itself—internal causality. Naturally, a distributed system interacts with the rest of the physical world outside of it, and there are also cause-and-effect relations in that world at large. For example, consider a couple planning a night out using a system that manages reservations for dinner and a movie. One person makes a reservation for dinner and lets the other person know with a phone call. After receiving the phone call, the second person goes to the system and reserves a movie. A distributed system has no way of knowing the first reservation has actually caused the second one.

This external causality cannot be detected by the system and can only be approximated by physical time. (Time, however, totally orders all events, even those unrelated—thus, it is no substitute for causality—and wall clocks are never perfectly synchronized.^11,16) This article focuses instead on internal causality—the type that can be tracked by the system.

Happened-Before Relation

In 1978, Leslie Lamport defined a partial order, referred to as happened before, that connects events of a distributed system that are potentially causally linked.⁸ An event c can be the cause of an event e, or c happened before e, iff (if and only if) both occur in the same node and c executed first, or, being at different nodes, if e could know about the occurrence of c thanks to some message received from some node that knows about c. If neither event can know about the other, then they are said to be concurrent.

Using the example of dinner and movie reservations, Figure 1 shows a distributed system with three nodes. An arrow between nodes represents a message sent and delivered. Both Bob’s positive answer to the dinner suggestion by Alice and Chris’s later request to join the party are influenced by Alice’s initial question about plans for dinner.

In this distributed computation, a simple way to check if an event c could have caused another event e (c happened before e) is to find at least one directed path linking c to e. If such a connection is found, this partial order relation is marked c → e to denote the happened-before relation or potential causality. Figure 1 has a₁ → b₂ and b₂ → c₃ (and, yes, also a₁ → c₃, since causality is transitive). Events a₁ and c₂ are concurrent (denoted a₁ || c₂), because there are no causal paths in either direction. Note x || y if and only if x y and y x. The fact Chris was bored neither influenced Alice’s question about dinner, not the other way around.

Thus, the three possible relations between two events x and y are: (a) x might have influenced y, if x → y; (b) y might have influenced x, if y → x; (c) there is no known influence between x and y, as they occurred concurrently x || y.

Causal Histories

Causality can be tracked in a very simple way by using causal histories.^3,14 The system can locally assign unique names to each event (for example, node name and local increasing counter) and collect and transmit sets of events to capture the known past.

For a new event, the system creates a new unique name, and the causal history consists of the union of this name and the causal history of the previous event in the node. For example, the second event in node C is assigned the name c₂, and its causal history is H_c = {c₁, c₂} (shown in Figure 2). When a node sends a message, the causal history of the send event is sent with the message. When the message is received, the remote causal history is merged (by set union) with the local history. For example, the delivery of the first message from node A to B merges the remote causal history {a₁, a₂} with the local history {b₁} and the new unique name b₂, leading to {a₁, a₂, b₁, b₂}.

Checking causality between two events x and y can be tested simply by set inclusion: x → y iff H_x H_y. This follows from the definition of causal histories, where the causal history of an event will be included in the causal history of the following event. Even better, marking the last local event added to the history (distinguished in bold in the figure) allows the use of a simpler test: x → y iff x ∈ H_y (for example, a₁ → b₂, since a₁ ∈ {a₁, a₂, b₁, b₂}). This follows from the fact a causal history includes all events that (causally) precede a given event.

Causality Tracking

It should be obvious by now that causal histories work but are not very compact. This problem can be addressed by relying on the following observation: the mechanism of building the causal history implies if an event b₃ is present in H_y, then all preceding events from that same node, b₁ and b₂, are also present in H_y. Thus, it suffices to store the most recent event from each node. Causal history {a₁, a₂, b₁, b₂, b₃, c₁, c₂, c₃} is compacted to {a → 2, b → 3, c → 3} or simply a vector [2, 3, 3].

Now the rules used with causal histories can be translated to the new compact vector representation.

Verifying that x → y requires checking if H_x H_y. This can be done, verifying for each node, if the unique names contained in H_x are also contained in H_y and there is at least one unique name in H_y that is not contained in H_x. This is immediately translated to checking if each entry in the vector of x is smaller or equal to the corresponding entry in the vector of y and one is strictly smaller (such as, ∀i: V_x[i] ≤ V_y [i] and ∃j: V_x[j] < V_y [j]). This can be stated more compactly as x → y iff V_x < V_y.

For a new event the creation of a new unique name is equivalent to incrementing the entry in the vector for the node where the event is created. For example, the second event in node C has vector [0, 0, 2], which corresponds to the creation of event c₂ of the causal history.

Finally, creating the union of the two causal histories H_x and H_y is equivalent to taking the pointwise maximum of the corresponding two vectors V_x and V_y (such as, ∀i: V [i] = max(V_x[i], V_y [i])). Logic tells us that, for the unique names generated in each node, only the one with the largest counter needs to be kept.

When a message is received, in addition to merging the causal histories, a new event is created. The vector representation of these steps can be seen, for example, when the first message from a is received in b, where taking the pointwise maximum leads to [2, 1, 0] and the new unique name finally leads to [2, 2, 0], as shown in Figure 3.

This compact representation, known as a vector clock, was introduced around 1988.^5,10 Vector comparison is an immediate translation of set inclusion of causal histories. This equivalence is often forgotten in modern descriptions of vector clocks and can turn what is a simple encoding problem into an unnecessarily complex and arcane set of rules, going against logic.

As shown thus far, when using causal histories, knowing the last event could simplify comparison by simply checking if the last event is included in the causal history. This can still be done with vectors, if you keep track of the node in which the last event has been created. For example, when questioning if x = [2, 0, 0] → y = [2, 3, 0], with boldface indicating the last event in each vector, you can simply test if x[0] ≤ y[0] (2 ≤ 2) since you have marked the last event in x was created in node A (that is, it corresponds to the first entry of the vector). Since marking numbers in bold is not a practical implementation, however, the last event is usually stored outside the vector (and is sometimes called a dot): for example, [2, 2, 0] can be represented as [2, 1, 0] b₂. Notice that now the vector represents the causal past of b2, excluding the event itself.

In an important class of applications there is no need to register causality for all the events in a distributed computation. For example, to modify replicas of data, it often suffices to register only those events that change replicas. In this case, when thinking about causal histories, you need only to assign a new unique name to these relevant events. Still, you need to propagate the causal histories when messages are propagated from one site to another and the remaining rules for comparing causal histories remain unchanged.

Figure 4 presents the same example as before, but now with events that are not registered for causality tracking denoted with ο. If the run represents the updates to replicas of a data object, then after nodes A and B are concurrently modified, the state of replica a is sent to replica b (in a message). When the message is received in node B, it is detected two concurrent updates have occurred, with histories {a₁} and {b₁}, as neither a₁ → b₁ nor b₁ → a₁. In this case, a new version that merges the two updates is created (merge is denoted by the join symbol ), which requires creating a new unique name, leading to {a₁, b₁, b₂}. When the state of replica b is later propagated to replica c, as no concurrent update exists in replica c, no new version is created.

Again, vectors can compact the representation. The result, known as a version vector, was created in 1983,¹² five years before vector clocks. Figure 5 presents the same example as before, represented with version vectors.

In some cases when the state of one replica is propagated to another replica, the two versions are kept by the system as conflicting versions. For example, in Figure 6, when the message from node A is received in node B, the system keeps each causal history {a₁} and {b₁} associated with the respective version. The causal history associated with the node containing both versions is {a₁, b₁}, the union of the causal history of all versions. This approach allows later checking for causality relations between each version and other versions when merging the states of additional nodes. The conflicting versions could also be merged, creating a new unique name, as in the example.

One limitation of causality tracking by vectors is that one entry is needed for each source of concurrency.⁴ You can expect a difference of several orders of magnitude between the number of nodes in a datacenter and the number of clients they handle. Vectors with one entry per client do not scale well when millions of clients are accessing the service.⁷ Again, a look at the foundation of causal histories shows how to overcome this limitation.

The basic requirement in causal histories is each event be assigned a unique identifier. There is no requirement this unique identifier be created locally or immediately. Thus, in systems where nodes can be divided into clients and servers and where clients communicate only with servers, it is possible both to delay the creation of a new unique name until the client communicates with the server and to use a unique name generated in the server. The causal history associated with the new version is the union of the causal history of the client and the newly assigned unique name.

Figure 7 shows an example where clients A and B concurrently update server S. When client B first writes its version, a new unique name, s₁, is created (in the figure this action is denoted by the symbol ) and merged with the causal history read by the client {}, leading to the causal history {s₁}. When client A later writes its version, the causal history assigned to this version is the causal history at the client, {}, merged with the new unique name s₂, leading to {s₂}. Using the normal rules for checking for concurrent updates, these two versions are concurrent. In the example, the system keeps both concurrent updates. For simplicity, the interactions of server T with its own clients were omitted, but as shown in the figure, before receiving data from server S, server T had a single version that depicted three updates it managed—causal history {t₁, t₂, t₃}—and after that it holds two concurrent versions.

One important observation is that in each node, the union of the causal histories of all versions includes all generated unique names until the last known one: for example, in server S, after both clients send their new versions, all unique names generated in S are known. Thus, the causal past of any update can always be represented using a compact vector representation, as it is the union of all versions known at some server when the client read the object. The combination of the causal past represented as a vector and the last event, kept outside the vector, is known as a dotted version vector.^2,13 Figure 8 shows the previous example using this representation, which, as the system keeps running, eventually becomes much more compact than causal histories.

In the condition expressed before (clients communicate only with servers and a new update overwrites all versions previously read), which is common in key-value stores where multiple clients interact with storage nodes via a get/put interface, the dotted version vectors allow causality to be tracked between the written versions with vectors of the size of the number of servers.

Final Remarks

Tracking causality should not be ignored. It is important in the design of many distributed algorithms. And not respecting causality can lead to strange behaviors for users, as reported by multiple authors.^1,9

The mechanisms for tracking causality and the rules used in these mechanisms are often seen as complex,^6,15 and their presentation is not always intuitive. The most commonly used mechanisms for tracking causality—vector clocks and version vectors—are simply optimized representations of causal histories, which are easy to understand.

By building on the notion of causal histories, you can begin to see the logic behind these mechanisms, to identify how they differ, and even consider possible optimizations. When confronted with an unfamiliar causality-tracking mechanism, or when trying to design a new system that requires it, readers should ask two simple questions: Which events need tracking? How does the mechanism translate back to a simple causal history?

Without a simple mental image for guidance, errors and misconceptions become more common. Sometimes, all you need is the right language.

Acknowledgments

We would like to thank Rodrigo Rodrigues, Marc Shapiro, Russell Brown, Sean Cribbs, and Justin Sheehy for their feedback. This work was partially supported by EU FP7 SyncFree project (609551) and FCT/MCT projects UID/CEC/04516/2013 and UID/EEA/50014/2013.

Figures

Figure 1. Happened-before relation.

Figure 2. Causal histories.

Figure 3. Vector clocks.

Figure 4. Causal histories with only some relevant events.

Figure 5. Version vectors with only some relevant events.

Figure 6. Causal histories with versions not immediately merged.

Figure 7. Causal histories in a distributed storage system.

Figure 8. Dotted version vectors in distributed storage system.

Submit an Article to CACM

CACM welcomes unsolicited submissions on topics of relevance and value to the computing community.

You Just Read

Why Logical Clocks Are Easy

View in the ACM Digital Library

DOI

10.1145/2890782

April 2016 Issue

Published: April 1, 2016

Vol. 59 No. 4

Pages: 43-47

Table of Contents

Join the Discussion (0)

Become a Member or Sign In to Post a Comment

The Latest from CACM

Explore More

News Apr 23 2024

Maximizing Power Grid Security

R. Colin Johnson

Security and Privacy

News Apr 18 2024

Keeping AI Out of Elections

Bennie Mols

Artificial Intelligence and Machine Learning

BLOG@CACM Apr 17 2024

Technical Marvels

Herbert Bruderer

Computer History

Shape the Future of Computing

ACM encourages its members to take a direct hand in shaping the future of the association. There are more ways than ever to get involved.

Get Involved

Communications of the ACM (CACM) is now a fully Open Access publication.

By opening CACM to the world, we hope to increase engagement among the broader computer science community and encourage non-members to discover the rich resources ACM has to offer.

Learn More

Happened-Before Relation

Causal Histories

Causality Tracking

Final Remarks

Acknowledgments

Figures

Why Logical Clocks Are Easy

DOI

April 2016 Issue

Related Reading

Join the Discussion (0)

Become a Member or Sign In to Post a Comment

Shape the Future of Computing

Communications of the ACM (CACM) is now a fully Open Access publication.