*The properties commonly ascribed to any object are, in last analysis, names for its behavior.*

—Judson Herrick, *An Introduction to Neurology*, 1918

### Key Insights

- Lack of knowledge by a computer system component about other components can formally be captured through the concept of indistinguishability. Whenever abstraction or interaction take place in a computer system, indistinguishability plays a critical role.
- Indistinguishability is the source of many lower bounds and impossibility results in CS. It is also the essence behind abstraction techniques so important in computing theory and in the design of large complex systems.
- indistinguishability has a topological nature: local states of components that do not distinguish between two system states induce a higher-dimensional simplicial complex, a structure with topological properties preserved as the system execution evolves.

Dost thou love me? I know thou wilt say “ay,”

And I will take thy word. Yet if thou swear’st

Thou mayst prove false. At lovers’ perjuries,

They say, Jove laughs.

—Shakespeare’s *Romeo and Juliet*, Act 2

*Abstraction—allowing the* details of lower-level components to be ignored—and *interaction*—allowing individual computing entities to cooperate—are key concepts in computer science. Many would argue that they play a crucial role in the success of computing: abstraction allows separate layers of the computing stack to be improved orthogonally, whereas interaction allows the abundance of computing power to be harnessed. This comes at a significant cost: each component of a computer system has limited knowledge about the state of other components. This happens either by choice, in the case of abstraction, or out of necessity, in the case of interaction.

From the perspective of an individual component, all other components, either other layers within the same computing entity or other computing entities, can be considered as an *environment.* Seen in this way, lack of knowledge about other components can formally be captured through the concept of *indistinguishability*, namely inability to tell apart different behaviors or states of the environment. Indistinguishability is therefore a consequence of the fact that computer systems are built of individual components, each with its own perspective of the system.

This article argues that because of its intimate relation with key issues in computing, indistinguishability, in its various flavors, plays a critical role in many computing areas. We explain this core concept and demonstrate some of its variants and applications, through four examples, trying to illustrate different, fundamental aspects of indistinguishable situations of abstraction and interaction.

Indistinguishability is at the core of the difficulty of constructing theoretical models for the behavior of a physical system. In our first example, we overview the role of indistinguishability in some of the most basic notions in computer science: state, automata, and learning. We will encounter both interaction (as means to reduce indistinguishability) and abstraction (captured by behavioral equivalence). Here, the environment is seen as a blackbox, implemented by an unknown automaton. What can an experimenter interacting with its environment through input/output symbols infer about the blackbox internals? The experimenter has an evolving mental model of the blackbox as an hypothesis automaton, which is indistinguishable from the actual automaton, given the current state of the interaction. The very notion of “state” is in terms of indistinguishability. In this example, indistinguishability has a “semantic” nature, related to computational complexity, namely the number of states in the automaton and the complexity of the learning algorithm.

Our second example demonstrates that indistinguishability is a powerful tool for deriving positive results. Examples abound, such as in artificial intelligence (for example, Turing’s test), cryptography (for example, pseudo-randomness), logic, and others. We consider the example of serializability in concurrent programming, where interaction is through shared variables, and locks permit the set of indistinguishable executions to be reduced. The correctness specification of a program is in terms of requiring that concurrent executions are indistinguishable from appropriate sequential executions. Abstraction is key, and indistinguishability becomes a powerful tool to design concurrent programs and prove their correctness, and in particular, to enable sequential reasoning.

We move in our third example to another very basic form of indistinguishability, related to time, and to the impossibility of observing realtime. An interaction among a set of computing entities can be seen as a partial order, representing causality relations between events happening in the system. Lamport’s seminal paper^{26} can be seen as using indistinguishability in two senses. First, it observed the relation to relativity theory, motivating the idea of reducing concurrent systems by indistinguishability to sequential thinking (by implementing a fault-tolerant distributed system as a replicated state machine). And second, it provided the framework for analyzing time-based algorithms, which depend on quantifying real-time indistinguishability. We illustrate this with a simple example showing how inherent limitations on clock synchronization can be derived through the impossibility of distinguishing the real-time occurrence of events in an execution up to given bounds on message transmission delays and clock drifts.

Prior examples consider a *single* execution, and analyze a set of executions that are indistinguishable from it, from the perspective of all the participating processes. Our final example considers how distributed computation is limited by the global indistinguishability structure of *all* possible executions. This structure is defined by a Kripke graph, where edges are labeled by processes that do not distinguish between the global states of the system represented by the two endpoints of the edge. It turns out that higher dimensional topological properties of this graph (more precisely, its dual, a simplicial complex) determine computability and the amount of interaction needed to distributively solve a problem.

### Automata and Learning

We start with a simple scenario where a *learner* is trying to infer the internal construction of a *blackbox.* The learner knows that the blackbox is a *deterministic finite automaton* (DFA) accepting a language over an alphabet Σ, but does not know which specific automaton it is. Through a conversation, the learner and the blackbox exchange symbols, and there is a set of automata all indistinguishable with respect to the current conversation. As the interaction evolves, this set of indistinguishable automata shrinks. Eventually, the learner would like it to shrink until it captures the language accepted by the blackbox.

Indistinguishability is at the core of the difficulty of constructing theoretical models for the behavior of a physical system.

**Moore’s theorem.** Indistinguishability is at the core of the difficulty of constructing theoretical models for the behavior of a physical system. Ashby’s Cybernetics book^{3} from 1956 already includes a chapter called “The black-box.” At the same time, Moore^{12} proposed the problem of learning finite automata, and studied indistinguishability of deterministic finite state machines, stating (Theorem 2):

“Given any machine

Sand any multiple experiments performed onS, there exist other machines experimentally distinguishable fromSfor which the original experiment would have had the same outcome.”

Moore’s theorem shows an impossibility in the characterization of any physical system as a deterministic state machine on the basis of a finite number of observational outcomes. This is because after a finite interaction with the blackbox, approximately, if all words are at most of length *k*, the learner has explored only paths of length *k* in the automaton *A* of the blackbox.

This does not prevent the construction of theoretical models of the behavior of a system, but it does challenge the assumption that a system has *only* the behaviors that have been characterized by experimental observations, namely the assumption that any theoretical model is complete. Further discussion of the relation between Moore’s theorem and physics appears in Fields.^{16}

**The Myhill-Nerode theorem.** If the interaction with the blackbox is only through input/output symbols, how can the learner know anything at all about its internal construction, even if it has any states at all? States are not directly observable, so what *is* a state, from the perspective of the learner? The Myhill-Nerode theorem, “one of the conceptual gems of theoretical computer science” according to Rosenberg,^{33} offers a complete mathematical characterization of the notion of state, via basic algebraic properties defined only on input/output behavior.

A string *t* ∈ Σ^{*} *distinguishes* two strings *u* and *v* in a language *L*, if *vt* ∉ *L* and *vt* ∉ *L.* If there is a string *t* distinguishing *u* and *v*, then the state *s* = δ(*q*_{0}, *u*) must be different from the state *s’* = δ(*q*_{0}, *v*), for any automaton *M* with transition function δ, recognizing *L.* Conversely, two strings *x* and *y* are *indistinguishable* (by *L*) if there is no string *t* ∈ Σ^{*} that distinguishes them. We have the equivalence *Nerode congruence* on Σ^{*}, defined by

Let [*s*]_{L} be the set of all strings that are indistinguishable from *s*, and *Q* be the set of all corresponding equivalence classes. Thus, the essence of the notion of “state” is an indistinguishability equivalence class; define a DFA Z as follows:

- the states
*Q*are the equivalence classes of ≡_{L}, - the initial state
*q*_{0}is [ε]_{L}, the equivalence class of the empty word, - δ([
*u*]_{L},*a*) = [*ua*]_{L}for all [*u*]_{L}∈*Q*and*a*∈ Σ, and - the accepting states are
*F*= {[*u*]_{L}:*u*∈*L*}.

Selecting a representative for each equivalence class of ≡_{L}, we get a set of *access strings S* ⊂ Σ^{*}. Starting in the initial state, if we follow the transitions as indicated by *u* ∈ *S*, it leads us to a state *q* that is uniquely identified by *u.* Figure 1 depicts an example of a DFA *A*, and then it is explicitly represented by access strings as *H*_{2}.

The Myhill-Nerode theorem states that *L* is recognized by *Z* as defined earlier, and furthermore, *Z* is minimal: if a DFA *M* accepts *L*, then the equivalence relation ≡_{M} is a refinement of the equivalence relation ≡_{L}, where

and we say that *x* and *y* are *indistinguishable to M.*

Proofs that a given language cannot be recognized by a finite automaton can be viewed as indistinguishability arguments, based on the Myhill-Nerode theorem. Automata with infinitely many states can be viewed as abstractions of programs that can make infinitely many discriminations regarding the structure of a set of possible input strings.

Let λ_{q}(*v*) = 1 whenever *Z* accepts *v* ∈ Σ^{*} starting at state *q*, and λ_{q}(*v*) = 0 otherwise. If *q* = *q*_{0}, we may omit the sub-index, that is, *L* = {*w* : λ(*w*) = 1}. For learning, we will use the notion of a string *t* being a *witness* that two states are different. Notice that:

- For any pair of distinct states
*q*,*q’*of*Z*, there is a distinguishing word*t*∈ Σ^{*}such that λ_{q}(*t*) ≠ λ_{q’}(*t*).

**Learning automata.** Following the classic approach of *learning finite automata*,^{36} three additional approaches have been studied: *computational learning*,^{25} *model learning*,^{37} and *grammatical inference.*^{34} We next describe automata learning algorithms with a *minimally adequate teacher* (MAT), demonstrating fundamental ideas that are relevant to all four learning branches.

Minimization algorithms related to the Myhill-Nerode theorem work by *merging* indistinguishable states of a DFA. We describe algorithms working in the opposite direction, *splitting* states when discovering a witness string *t* demonstrating they are distinguishable.

The learner poses membership queries to the blackbox to try to learn the language *L* it accepts: Does *x* ∈ Σ^{*} belong to *L*? The learner starts with a hypothesis automaton *H*, that it updates during the conversation. The experimenter has no way of knowing when to stop asking questions, because there could be machines with more and more states, which return answers consistent with the current experiment. Even if the number of states of *M* is known to the experimenter, an exponential number of membership queries is required.^{2} To circumvent this, the MAT framework admits *equivalence queries*:

- Does
*H*correctly recognize*L*? If not, give me an example of a string*x*∈ Σ^{*}such that*x*∈*L*(*H*) −*L*(*M*) or*x*∈*L*(*M*) −*L*(*H*).

Using membership and equivalence queries, the experimenter can learn *L* with a number of queries that is polynomial in *n*, the number of states in Z, the Myhill-Nerode automaton for *L*, and in *m* the longest counterexample returned by the blackbox. (There are always counterexamples of length at most 2*n.*) The algorithm terminates with a DFA *H* that is isomorphic to *Z.* The MAT framework and the efficient algorithm, called *L*^{*}, were introduced in a seminal paper of Angluin.^{1} We stress that this kind of learning algorithms can be extended to learn other types of blackboxes, for example, logical formulas.

We illustrate the ideas behind the MAT framework through an example (inspired by Isberner et al.^{24}), to show how *distinguishing* is the basis of learning. Learning something new means splitting a state into two states (which are different, as evidenced by a new witness *t*).

Assume the blackbox is implemented by the DFA *A* in Figure 1. The learner maintains a set of prefix-closed access strings *S* ⊂ Σ^{*}; recall that access strings are representatives of equivalence classes. Distinct access strings *u*, *u’* correspond to distinct states of *A* that the learner has identified, and the learner has a witness of this fact, through a string *t*, such that λ(*u* · *t*) ≠ λ(*u’* · *t*). The learner maintains this set of *discriminating suffixes D* ⊂ Σ^{*}, that it has found through membership queries.

The basic data structure is the *observation table*, with two types of rows (in the figure, a horizontal line in a table divides the two types). Each row of the first type is identified by an access string *u* ∈ *S*, and each row of the second type identifies a transition of the hypothesis automaton. Each column is identified by a discriminating string *t.* The content of a cell in the table is λ(*u*·*t*) (where λ refers to the current hypothesis automaton). Each time the learner gets a counterexample, it extracts from it a discriminating suffix. Many algorithms have been proposed, differing in how they extract a discriminating suffix from a counterexample. Here we are only concerned with the fact that it is always possible to do so.

The learner initially has as hypothesis the DFA *H*_{0}. It then learns that ε is discriminating ε and *b*, and hence splits state [ε] creating state [*b*]. In the table, the new row for access string *b* is added, and the transition for *b* is replaced by the two transitions *ba*, *bb.* Thus, the new hypothesis automaton is *H*_{1}, and by following string *b* in this automaton, one “accesses” state [*b*], an equivalence class of strings indistinguishable from the representative of the class, *b* (for example, *aab* also belongs to [*b*]; it s indistinguishable from *b* and also accesses [*b*]). In *H*_{1}, we have a (single) column identified by ε, witnessing that states [ε] and [*b*] are different, because ε concatenated with ε is in *L*, whereas ε concatenated with *b* is not. Then, *H*_{2} is produced when it learns that *b* discriminates ε and *a*, λ(ε · *b*) ε λ(*a* · *b*), and hence the state [ε] is split creating the state [*a*]. More generally, if *w* is a counterexample for *H*, then it has a suffix *at*, s.t. for two access strings *u*, *u’* ∈ *U*, *ua* and *u’* reach the same state in *H*, but λ(*ua* · *t*) ≠ λ(*u’t*). Thus, *u’* ∈ *U*, *ua* is a transition in the observation table, and both rows are equal, and adding *t* to table distinguishes *ua* and *u’*, with *ua* being moved to the upper part of table.

**Behavioral equivalences.** Behavioral equivalences^{17} are based on the idea that two systems are equivalent whenever no external observation can distinguish between them. They are used to *abstract* from unwanted details; to formalize the idea that it is not the internal structure of a system which is of interest but its behavior with respect to the outside world.

Bisimulation, the strongest form, is a rich concept independently discovered in computer science, modal logic, and set theory, with applications to many areas,^{35} and we would have devoted much more space to it if it was not for lack of space. We touched on it, with the Myhill-Nerode theorem example, which is the basis for automata minimization algorithms modulo bisimilarity.^{23} Another typical application is to prove the correctness of an algorithm, with a big automaton representation *M*, by analyzing a smaller bisimilar model *Z* that captures its essence, as illustrated in Figure 2, where *R* is the bisimilar relation between states of *Z* and *M.* Intuitively, two systems are *bisimilar* if they match each other’s moves. Verifying the algorithm *M* using a model checking problem *M* |= ϕ is equivalent to solving the much smaller problem *Z* |= ϕ. From the in-distinguishability perspective, it is interesting to consider iterative abstraction-refinement, see Clarke et al.^{9}

**Figure 2. Schematic illustration of bisimulation.**

### Sequential Reductions in Concurrent Programming

A notable example of behavioral equivalence is the notion of *serializability*, utilized in most of the database systems (in various variants) since their early days in the 1970s. The notion is used in concurrency control of databases and in various transactional systems (processing, management, transactional memory, etc.), both centralized and distributed. A key challenge in the design and analysis of concurrent systems is dealing with all possible interleavings of concurrent processes. Indistinguishability is useful for defining the semantics of a concurrent program, in terms of the notion of serializability. It is also important in verification, as it can be exploited to verify a concurrent program by checking only its sequential executions.^{a}

**Serializability and two-phase locking.** Serializability is studied in a setting where processes interact through shared variables. Two executions α_{1} and α_{2} are *indistinguishable to a specific process*, if the process accesses the same sequence of variables in both executions, and returns the same results. An execution is *serializable*^{8,39} if it is indistinguishable to all processes from a *sequential* execution, in which each process executes its procedure invocation to completion, without interleaving of any other process.

The classic way to ensure serializability is to protect shared variables with locks, using a locking protocol governing how locks are acquired and released. Thus, an execution of the system, α, is a sequence of *events* each taken by a single process; the events either access shared variables, or acquire and release locks on these shared variables. In *two-phase locking (2PL)*,^{13} each process has a *growing* phase of lock acquisition (in some order), followed by a *shrinking* phase of lock release. Namely, once a process released a lock, it can no longer acquire any lock, even on another variable. For example, given shared variables *X*, *Y*, and two processes *p*_{1}, *p*_{2}:

Two-phase locking is a mechanism for enforcing indistinguishability from sequential executions, as demonstrated by the following geometric interpretation. An execution of the processes *p*_{1}, *p*_{2} defines a particular interleaving of the order in which the processes acquire and release the locks. It can be represented as a path in a two-dimensional space (see Figure 3). If a lock is acquired or released by *p*_{1}, the path moves one unit on the horizontal axis; similarly, when a lock is acquired or released by *p*_{2}, the path moves one unit on the vertical axis. All paths start in (0, 0), when no operations have occurred, and they all end in (1, 1), where all operations have occurred, by both processes.

**Figure 3. Geometric interpretation of all interleavings of two processes acquiring and releasing shared variables X, Y.**

Each time two operations of an execution are swapped, in a way that is indistinguishable to both processes, the path is deformed. In Figure 3, two such paths are illustrated: *P*_{1} which is sequential (*p*_{1} then *p*_{2}), and *P*_{2} where *acq*(*Y*) by *p*_{2} is swapped with *rel*(*X*) by *p*_{1}.

There are two forbidden rectangles, where no execution path can go through: in the vertical (blue) one, *Y* would be acquired simultaneously by both, whereas in the horizontal rectangle (red), the same holds for *X.* Their union is the forbidden region where no execution enters. Notice that if both processes acquire *X* and *Y* (in either order), the protocol enters the deadlock region. The main point is that there are two classes *C*_{1}, *C*_{2}, of *homotopic* paths, that is, paths within a class can be deformed to each other. In one class, all paths go above the forbidden region and are indistinguishable from a sequential execution in which *p*_{2} goes first, whereas in the other class, all executions go below the forbidden region and are indistinguishable from a sequential execution where *p*_{1} goes first.

Notice that in a program where both processes acquire the locks in the same order, the forbidden region is a square, and hence no deadlocks can happen. Directed topology and the geometric theory of execution paths homotopy are studied in Fajstrup et al.,^{15} showing a direct representation of indistinguishability as continuous deformation of paths in an *n*-dimensional space (for *n* processes).

**Verifying two-phase locking.** Because indistinguishable executions can be substituted for each other, it means that checking whether one execution satisfies a particular property informs us whether all indistinguishable executions satisfy this property. Therefore, indistinguishability facilitates the verification of concurrent programs. When a program is serializable certain properties can be verified by considering only sequential (noninterleaved) executions of the program. This is equivalent to reasoning assuming a sequential setting.

But how can we prove that a program is serializable? Obviously, if we prove that it follows the two-phase locking protocol, then it is serializable. However, in reality, we are not given an execution example, but a program, possibly including conditional and repeat statements. Thus, we need to consider all its possible executions, to see if each one satisfies the two-phase locking regime. It turns out that we can ensure that the program follows 2PL, by considering only its sequential executions. The next theorem holds provided the program has no nonterminating loops.

THEOREM 3.1. *If any execution satisfies two-phase locking when events of different processes are not interleaved, then any interleaved execution also satisfies two-phase locking.*

Proving the theorem goes through showing that every execution that violates 2PL is indistinguishable from a noninterleaved execution in which the protocol is also violated. This implies that if we check (manually or mechanically) all noninterleaved executions of the protocol without finding a violation of 2PL, then all executions of the protocol do not violate 2PL.

Toward a contradiction, assume the claim does not hold and let α = α'(*i _{t}*,

*e*) be the shortest execution that violates 2PL for which there is no indistinguishable noninterleaved execution; see Figure 4. Note that (

*i*,

_{t}*e*) is an event of process

*p*, that violates 2PL, that is, acquires a lock after releasing a lock, or accesses an unlocked location. As α is the shortest such execution, we know that for prefix α’ of α there is an indistinguishable noninterleaved execution , (where α

_{i}_{i}

_{j}, contains events by

*p*

_{i}

_{j}only).

**Figure 4. Moving the event (it, e) to after p_{i}_{t}‘s events.**

We argue that moving the event (*i _{t}*,

*e*) to after

*p*

_{i}

_{t}‘s events in , will still cause

*p*

_{i}

_{t}to take the offending event. Intuitively, this happens because the event depends only on information that is local to the process

*p*

_{i}

_{t}or locked by it, and

*p*

_{i}

_{t}does not distinguish between the original execution and the noninterleaved execution. Namely,

*p*

_{i}

_{t}has the same state at the end of α and at the end of α

_{i}

_{1}…α

_{i}

_{t}. Therefore, the event can be moved to appear after the events α

_{i}

_{t}of the same process. Hence,

*p*

_{i}

_{t}will make the same offending event (

*i*,

_{t}*e*), implying that the noninterleaved execution α

_{i}

_{1}… α

_{i}

_{t}(

*i*,

_{t}*e*), also violates 2PL.

The reduction holds for any noncentralized locking protocol, such as commonly used ones like *two-phase*, *handover-hand*, *tree*, and *dynamic graph* locking. It allows *sequential reasoning*, whether manual or automated, about concurrent programs both in verifying that they adhere to a locking protocol and in development of algorithms for them. The reduction enables simpler and more efficient verification algorithms of a class of properties, called *transaction-local.* It justifies the use of sequential Hoare Logic or sequential type systems or sequential abstract interpretation to verify that the program adheres to a locking protocol. Programmers wishing to add, for example, a new procedure to swap two adjacent elements in a list to a program that uses hand-over-hand locking, do not have to worry about concurrent interleaving with other procedures. More details are in Attiya et al.,^{6} such as the case of nonterminating loops.

Indistinguishability is also used to prove a theorem that shows that if serializability is ensured in a program with two processes and two variables, it is ensured in any program, provided the implementation satisfies certain structural properties, one of them being symmetry.^{19} The proof goes by contradiction, taking an execution of the larger system that violates serializability and perturbing it into a bad execution for a system with two processes and two variables; a key step relies on an indistinguishability argument using symmetry.

### Real-Time Indistinguishability

The previous examples describe asymmetric interactions, where one party interacts with another party, whose semantics (internal details) are hidden or abstracted away. Our next example ignores the semantics of the interactions, concentrating only on their timing.

The fundamental problem is estimating *distant simultaneity*—the time difference between the occurrence of two spatially separated (at different processes) events. This is behind many real-time applications in computer science that depend on clock synchronization, such as synchronizing cellphone communications, positioning systems (for example, GPS), failure detection, efficient use of resources (for example, releasing a connection), timestamping events and timeouts, and so on.

Computer clocks are typically based on inexpensive oscillator circuits and quartz crystals that can easily drift seconds per day. However, atomic clock time, so ubiquitous and integral to modern life, trickles down to the clocks we use daily, distributed through the Network Time Protocol and other means. Atomic clocks are so precise that if such a clock existed when Earth began, about 4.5 billion years ago, it would be off by only 30 s today.

How precise the time of an atomic clock can be estimated depends on the transmission delay bounds along communication paths from the atomic clock to the local computer, and on the drift bounds of the clocks of the computers along such paths. In other words, when a computer gets a message with the time of some atomic clock, the actual moment when the clock reading took place could have occurred at any moment within some range *B*, and from the computer’s perspective, it is indistinguishable which exact moment within *B* is the actual one. Thus, the computer’s best estimate of the atomic clock time is based on |*B*|/2. Indeed, selecting the mid-point is hedging the bets, because anything else leaves open the possibility of a bigger mistake. We now explain in more detail how to compute *B.*

Consider a process *p*_{1} trying to synchronize its clock with an atomic reference clock, assumed to give real-time exactly, located in *p*_{0}. The basic interaction is when *p*_{1} has a direct link to *p*_{0}, as illustrated in Figure 5. Process *p*_{1} sends a message to *p*_{0} and gets back a response. The send event
by *p*_{1} occurs at real-time 1, the event of *p*_{0} receiving it,
, occurs at real-time 6 (to simplify the example, we assume *p*_{0} responds immediately, in the same event), and *p*_{1} receives the response in event
at real-time 12. Real-time is not directly observable, instead, each event occurs at some local time, which the process can observe. The precise meaning of real-time not being observable is through indistinguishability. Namely, suppose that, although the first message delay was 5 time units, it is known that it must have taken at least 4 time units; also, assume the return message cannot take more than 9 time units. As for the local clock of *p*_{1}, suppose its drift is bounded, such that between the sending and the receiving events, at most 12 time units could have passed.

**Figure 5. p_{0} sends and p_{1} responds.**

What is the *latest* time that
could have occurred with respect to
? Answering also what is the *earliest* it could have occurred, would yield the desired *indistinguishability interval B*, where
could have occurred, and selecting the midpoint would be used to compute the optimal correction to the local clock time of *p*_{1}. The crucial insight is that to compute how late
could have occurred with respect to
we have to shift to the right as much as possible the point of occurrence of
, subject to two constraints: (1) the maximum delay of the second message (9 units) and (2) the minimum delay of the first message plus the minimum length of the time interval from
to
(the fastest that *p*_{1}‘s clock could have been running). In the example, the latest that
can happen is at real-time 14 determined by the fastest delay of the first message and the slowest clock drift of *p*_{1}, and not by the largest delay of the second message (which could have been delivered at 15).

More generally, *p*_{1} may be further away from the process *p*_{0} with an atomic reference clock, and an arbitrary *execution* α is used to synchronize *p*_{1}‘s clock, where many more message exchanges take place, along different paths between *p*_{1} and *p*_{0}. The goal is to estimate the indistinguish-ability interval of an event *e* at process *p*_{1}, with respect to an event *e*_{0} in *p*_{0}. The previous example hints that the task at hand has to do with computing distances, on paths formed by indistinguishability intervals, formalized as follows.

The execution α is represented by a weighted directed graph *G* = (*V*, *E*, *r*, *l*). Each vertex of *V* is an event of α, either a *send* or a *receive* event. The *i*th event happening in process *j* is denoted
. The directed edges *E* are causal relationships: there is a directed edge between two consecutive events in the same process,
, and there is a directed edge
, whenever
is a send event and
is the corresponding receive event. The weight functions *r*, *l* timestamp the events. For each *e* ∈ *V*, *real*(*e*) is the real-time occurrence of event *e*, and *local*(*e*) is the time according to the clock of the process where *e* happens. Since the clock of *p*_{0} is perfect, for all events
in *p*_{0}, we have
.

For each pair of events *e*_{1}, *e*_{2} joined (in either direction) by a directed edge of *G*, bounds on the relative real-time occurrence of two events can be estimated,

both when the edge represents a message transmission delay, and when it represents the time it takes a process to execute consecutive computational events. Then, define *local*(*e*_{1}, *e*_{2}) = *local*(*e*_{1}) − *local*(*e*_{2}), and let *w*(*e*_{1}, *e*_{2}) = *B*(*e*_{1}, *e*_{2}) − *local*(*e*_{1}, *e*_{2}). These weights *w* can be positive or negative, but summing them along a cycle always gives a nonnegative value (the telescopic sum of *local*(*e _{i}*,

*e*

_{i+1}) along a cycle is 0). Thus, for a pair of events

*e*

_{1}and

*e*

_{2}, the distance

*d*(

*e*

_{1},

*e*

_{2}) with respect to these weights is well defined. Interestingly, observe that

*d*(

*e*,

*e’*) = 0, for any two events in

*p*

_{0}. It is not hard to show

^{30}that the indistinguishability interval of an event

*e*at some process

*p*

_{1}, with respect to an event

*e*

_{0}in

*p*

_{0}is as follows.

THEOREM 4.1. *real*(*e*) ∈ [−*d*(*e*_{0}, *e*), *d*(*e*, *e*_{0})]

The meaning of this theorem is that *e* might have occurred at any time in this interval. Furthermore, for each such time, there is an execution indistinguishable to all processes.

Indistinguishability is useful for defining the semantics of a concurrent program in terms of the notion of serializability.

These results are based on Patt-Shamir and Rajsbaum,^{30} a follow up of,^{5, 20} which studied how closely in terms of real-time processes can be guaranteed to perform a particular action, in a failure-free environment. The possibility of failures affects the size of the indistinguishability interval, providing a very interesting topic from the indistinguishability perspective. The standard technique is to consider several clock reference values, and taking the average after disregarding the most extreme values. There are many papers on clock synchronization algorithms, see, for example, Attiya and Ellen^{4} for references on the more theoretical perspective, and the book^{28} from the more practical perspective.

### Global Indistinguishability Structure

The previous examples of indistinguish-ability have a *local* flavor: we look at a single execution α and the executions indistinguishable from α to *all* processes. It turns out that studying executions that are indistinguishable to a *subset* of processes lead to understanding the global indistinguishability structure of *all* executions. This uncovers an intimate relation between indistinguishability and higher-dimensional topological properties. The overview presented here is very informal; for a more precise description, see Herlihy et al.^{22}

**Initial indistinguishability structure.** Consider three processes *b*, *g*, *w* (black, gray, white) that communicate with each other to solve some task. When the computation begins, each process receives an input value. In the *binary consensus* task, the set of input values is {0, 1}. In certain *renaming* tasks, processes start with distinct input values taken from the set {0, 1, 2, 3}. Initially each process knows only its own input. An *initial state*

is a set of three *initial local states*, each one consisting of a pair of values. Two initial states, *I*_{1} and *I*_{2} are indistinguishable to a process, if the process has the same input value in both states, that is, if *I*_{1} ∩ *I*_{2} contains its initial local state. If we draw an initial state as a triangle, whose vertices are the local initial states, *I*_{1} and *I*_{2} share an edge if they are indistinguishable to two processes, and they share only a vertex if only one process does not distinguish between them. Figure 6 shows the *input complex* for consensus looks like a triangulated sphere, and the one for renaming looks like a triangulated torus. Each one is a *simplicial complex* because it consists of a family of sets closed under containment (each edge of a triangle is a set of two local states, and each vertex is a singleton set).

**Figure 6. Consensus and renaming input complexes.**

**How indistinguishability evolves.** As processes communicate with each other, learning about each other’s input values, the structure of indistinguishability evolves. Suppose that the processes publicly announce their input values, but each process may miss hearing either or both of the other processes’ announcements, as determined by a *communication pattern*, namely a directed graph *G* on the vertices *b*, *g*, *w*; an arrow *v* → *v’* signifies that *v’* hears the input from *v.* Thus, *v’* hears inputs from the set *N*^{–}(*v’*) of processes which have an arrow toward vertex *v’.* Which input value *v* hears from *v* depends on which initial state *I* is *G* applied to. Applying *G* to an initial state *I*, produces a new state, {(*b*, *view*(*b*)), (*g*, *view*(*g*)), (*w*, *view*(*w*))}, where the local state of *p*, *view*(*p*), is the subset of *I* of processes *N*^{–}(*p*).

Figure 7 illustrates the *IS-patterns* (*immediate snapshot* or *block* executions), a subset of all possible communication patterns. An IS-pattern for a set of processes *P* is defined by an ordered partition *S*_{1}, …, *S _{k}* of

*P*(1 ≤

*k*≤ |

*P*|), specifying that processes in

*S*hear the values from all processes in

_{i}*S*,

_{j}*j*≤

*i.*Consider, for instance, the IS-pattern {

*b*,

*g*,

*w*} consisting of the trivial partition of {

*b*,

*g*,

*w*}, which corresponds to the center triangle, where all processes hear from each other. The arrows

*g*↔

*w*belong also to the top triangle, corresponding to the partition {

*b*}, {

*g*,

*w*} where the only difference is that

*b*does not hear from the other two processes.

**Figure 7. IS-communication patterns.**

IS-patterns are important because when applied to an input complex, *I*, the resulting *protocol complex* *P* is a subdivision of *I*. In Figure 8, IS-pat-terns are applied to two consensus input simplexes. One can see that *b* and *w* with input 0 belong to two input triangles, and this edge is subdivided into three edges in *P*, which belong to both the blue and the yellow subdivided triangles, due to IS-patterns where *b* and *w* do not hear from *g* (and hence cannot tell if its input is 0 or 1).

**Figure 8. Two input triangles, application of IS-patterns on them, and the requirement to produce consensus outputs.**

In the same way that we applied each IS-pattern to each initial state to get *P*, we can again apply each IS-pattern, but now to each state of *P*, obtaining a subdivision of *P*, and so forth. Each time the processes communicate once more through an IS-pattern, the input complex is subdivided more and more finely. Indeed, a fundamental discovery is that there are topological invariants, preserved no matter how many times the processes communicate, and no matter what they tell each other each time they communicate. In the case of any unreliable asynchronous communication by either message passing or read/write shared-memory, *P* “looks like” (is homotopic to) the input complex *I*.

Remarkably, topological invariants determine the computational power of the model. In other, more reliable models of computation (for example, at most *t* out of *n*, *t* < *n* − 1 processes can fail, or synchronous models, or shared-memory primitives stronger than read/write registers), *P* preserves weaker topological invariants, and “holes” are created, giving the model its additional computability power.

**Specifications as indistinguishability requirements.** Suppose that after communicating through IS-patterns, each process produces an output value. Let (*p*, *view*(*p*)) be the local state of a process *p* in the protocol complex *P*, after an IS-pattern. Hence, the output value produced by *p* is a function of its view, δ(*p*, *view*(*p*)). Namely, if *p* does not distinguish between two triangles of *P*, then it must decide the same value in both.

A simplicial complex defined by triangles labeled with output values is used to specify the task that the decision values should satisfy. For binary consensus, the *output complex*, in Figure 8, consists of two disjoint triangles, one labeled with 0 output values in all its three vertices, and another labeled with 1 in all its three vertices. Thus, a *task* 〈 *I*, *O*, Δ〉 consists of an input complex *I*, an output complex *O*, and a relation Δ specifying for each input triangle σ ∈ *I*, which output of *O*, Δ(σ), represent valid outputs for the task.

Finally, Figure 8 is meant to represent that the decision function δ*solves the task*, if for any triangle σ’ in *P*, δ(σ’) is a triangle τ ∈ *O*, such that τ ∈ Δ(σ), where σ is the input triangle for σ’.

To summarize, a new indistinguishability global structure (represented by *P*) is generated after communication, and a task specifies a target indistinguishability structure (represented by *O*). The question is whether *P* can be (simplicially) mapped to *O* respecting Δ. This is a topological question with deep implications to distributed task computability in various models (message-passing and shared memory, synchronous and asynchronous, with crash and Byzantine failures).

This formalization can be interpreted as a question of gaining knowledge, as explained in Goubault et al.,^{18} where it is described how the simplicial complexes described in this section have an equivalent representation as Kripke models. Roughly speaking, each triangle is a state of the Kripke graph, and if two triangles share a vertex of process *p*, then the two corresponding states are connected by an edge labeled *p.* Indeed, there is an intimate relation between indistinguishability and the theory of reasoning about knowledge for distributed computing described in Fagin et al.^{14}

### Conclusion

Indistinguishability plays a central role in computer science. Examples from different areas (automata theory, learning, specification, verification, distributed computing and epistemic logic) demonstrate how different levels of abstraction entail distinct notions of indistinguishable observations, and different uses of indistinguishability (to show computability and complexity limitations, and also to design solutions). Some examples should be treated in more depth, and there are many additional application areas.

One application area is *computational learning* and related complexity topics, as recently reviewed in Wigderson.^{40} Many subareas can be viewed through the lenses of *probabilistic indistinguishability*, for example, PAC learning,^{38} cryptography, communication complexity, indistinguishability despite errors,^{32} and coding theory.

Indistinguishability plays a role in artificial intelligence, for example, in Turing’s test, and more generally, Turing-like tests for other applications, such as Go simulators^{10} and writing a program simulating a living organism.^{21}

We discussed formal methods, another area where indistinguishability is a key, notably in behavioral equivalences.^{11} And we discussed logic, where the longstanding connection between modal logic and topology goes back to McKinsey and Tarski,^{27} and up to today, with a topological semantics for belief.^{7} Another interesting example from logic is Ehrenfeucht-Fraïssé games.^{31}

Distributed computing is all about interactions, with abundant instances where indistinguishability is a key. Examples include labeling schemes, synchronizers, mutual exclusion, anonymity and symmetry, and partitioning. Many impossibility results are discussed in Attiya and Ellen.^{4}

Finally, indistinguishability cuts across topics. Multi-agent epistemic logic relies on Kripke models to represent indistinguishability.^{14} These in turn, can be considered as the dual of simplicial complexes,^{18} and we described how the indistinguishability structure evolves as interaction occurs preserving topological properties. Also, having knowledge means being able to distinguish between situations, so the same action must be taken in indistinguishable setups.^{29} We discussed the duality between indistinguishability and knowledge also in the context of learning automata.

**Acknowledgments.** We would like to thank Hans van Ditmarsch, Jérémy Ledent, Arnold Rosenberg, Jennifer Welch, and the reviewers for helpful comments. Supported by grants from UNAM-PAPIIT IN106520 and ISF 380/18.

## Join the Discussion (0)

## Become a Member or Sign In to Post a Comment