Systems and Networking Research highlights

Continuity and Robustness of Programs

By Swarat Chaudhuri, Sumit Gulwani, and Roberto Lublinerman

Posted Aug 1 2012

Abstract
1. Introduction
2. Continuity, Lipschitz Continuity, and Robustness
3. Program Verification
4. Related Work
5. Conclusion
References
Authors
Footnotes
Figures

Computer scientists have long believed that software is different from physical systems in one fundamental way: while the latter have continuous dynamics, the former do not. In this paper, we argue that notions of continuity from mathematical analysis are relevant and interesting even for software. First, we demonstrate that many everyday programs are continuous (i.e., arbitrarily small changes to their inputs only cause arbitrarily small changes to their outputs) or Lipschitz continuous (i.e., when their inputs change, their outputs change at most proportionally). Second, we give an mostly-automatic framework for verifying that a program is continuous or Lipschitz, showing that traditional, discrete approaches to proving programs correct can be extended to reason about these properties. An immediate application of our analysis is in reasoning about the robustness of programs that execute on uncertain inputs. In the longer run, it raises hopes for a toolkit for reasoning about programs that freely combines logical and analytical mathematics.

1. Introduction

It is accepted wisdom in computer science that the dynamics of software systems are inherently discontinuous, and that this fact makes them fundamentally different from physical systems. More than 25 years ago, Parnas¹⁵ attributed the difficulty of engineering reliable software to the fact that “the mathematical functions that describe the behavior of [software] systems are not continuous.” This meant that the traditional analytical calculus—the mathematics of choice when one is analyzing the dynamics of, say, fluids—did not fit the needs of software engineering too well. Logic, which can reason about discontinuous systems, was better suited to being the mathematics of programs.

In this paper, we argue that this wisdom is to be taken with a grain of salt. First, many everyday programs are continuous in the same sense as in analysis—that is, arbitrarily small changes to its inputs lead to arbitrarily small changes to its outputs. Some of them are even Lipschitz continuous—that is, perturbations to the program’s inputs lead to at most proportional changes to its outputs. Second, we show that analytic properties of programs are not at odds with the classical, logical methods for program verification, giving a logic-based, mostly-automated method for formally verifying that a program is continuous or Lipschitz continuous. Among the immediate applications of this analysis is reasoning about the robustness of programs that execute under uncertainty. In the longer run, it raises hopes for a unified theory of program analysis that marries logical and analytical approaches.

Now we elaborate on the above remarks. Perhaps the most basic reason why software systems can violate continuity is conditional branching—that is, constructs of the form “if B then P₁ else P₂.” Continuous dynamical systems arising in the physical sciences do not typically contain such constructs, but most nontrivial programs do. If a program has a branch, then even the minutest perturbation to its inputs may cause it to evaluate one branch in place of the other. Thus, we could perhaps conclude that any program containing a branch is ipso facto discontinuous.

To see that this conclusion is incorrect, consider the problem of sorting an array of numbers, one of the most basic tasks in computing. Every classic algorithm for sorting contains conditional branches. But let us examine the specification of a sorting algorithm: a mathematical function Sort that maps arrays to their sorted permutations. This specification is not only continuous but Lipschitz continuous: change any item of an input array A by ± , and each item of Sort(A) changes at most by ± . For example, suppose A and A’ are two input arrays as below, with A’ obtained by perturbing each item of A at most by ±1. Then Sort(A’) can be obtained by perturbing each item of Sort(A) at most by ±1.

Similar observations hold for many of the classic computations in computer science, for example, shortest path and minimum spanning tree algorithms. Our program analysis extends and automates methods from the traditional analytical calculus to prove the continuity or Lipschitz continuity of such computations. For instance, to verify that a conditional statement within a program is continuous, we generalize the sort of insight that a high-school student uses to prove the continuity of a piecewise function like

Intuitively, abs(x) is continuous because its two “pieces” x and −x are continuous, and because x and −x agree on values in the neighborhood of x = 0, the point where a small perturbation can cause abs(x) to switch from evaluating one piece to evaluating the other. Our analysis uses the same idea to prove that “if B then P₁ else P₂” is continuous: it inductively verifies that P₁ and P₂ are continuous, then checks, often automatically, that P₁ and P₂ become semantically equivalent at states where the value of B can flip on a small perturbation.

When operating on a program with loops, our analysis searches for an inductive proof of continuity. To prove that a continuous program is Lipschitz continuous, we inductively compute a collection of Lipschitz matrices that contain numerical bounds on the slopes of functions computed along different control paths of the program.

Of course, complete software systems are rarely continuous. However, verification technique like ours allows us to identify modules of a program that satisfy continuity properties. A benefit of this is that such modules are amenable to analysis by continuous methods. In the longer run, we can imagine a reasoning toolkit for programs that combines continuous analysis techniques, for example, numerical optimization or symbolic integration, and logical methods for analyzing code. Such a framework would expand the classical calculus to functions encoded as programs, a representation worthy of first-class treatment in an era where much of applied mathematics is computational.

A more immediate application of our framework is in the analysis of programs that execute on uncertain inputs, for example noisy sensor data or inexact scientific measurements. Unfortunately, traditional notions of functional correctness do not guarantee predictable program execution on uncertain inputs: a program may produce the correct output on each individual input, but even small amounts of noise in the input could change its output radically. Under uncertainty, traditional correctness properties must be supplemented by the property of robustness, which says that small perturbations to program’s inputs do not have much effect on the program’s output. Continuity and Lipschitz continuity can both serve as definitions of robustness, and our analysis can be used to prove that a program is robust.

The rest of the paper is organized as follows. In Section 2, we formally define continuity and Lipschitz continuity of programs and give a few examples of computations that satisfy these properties. In Section 3, we give a method for verifying a program’s continuity, and then extend it to an analysis for Lipschitz continuity. Related work is presented in Section 4; Section 5 concludes the paper with some discussion.

2. Continuity, Lipschitz Continuity, and Robustness

In this section, we define continuity² and Lipschitz continuity³ of programs and show how they can be used to define robustness. First, however, we fix the programming language IMP whose programs we reason about.

IMP is a “core” language of imperative programs, meaning that it supports only the most central features of imperative programming—assignments, branches, and loops. The language has two discrete data types—integers and arrays of integers—and two continuous data types—reals and arrays of reals. Usual arithmetic and comparisons on these types are supported. In conformance with the model of computation under which algorithms over reals are typically designed, our reals are infinite-precision, and elementary operations on them are assumed to be given by unit-time oracles.

Each data type in IMP is associated with a metric.^a This metric is our notion of distance between values of a given type. For concreteness, we fix, for the rest of the paper, the following metrics for the IMP types:

The integer and real types are associated with the Euclidean metric d(x, y) = |x − y|.
The metric over arrays (of reals or integers) of the same length is the L_∞-norm: d(A₁, A₂) = max_i{|A₁[i] − A₂[i]|}. Intuitively, an array changes by when its size is kept fixed, but one or more of its items change by . We define d(A₁, A₂) = ∞ if A₁ and A₂ have different sizes.

The syntax of arithmetic expressions E, Boolean expressions B, and programs Prog is as follows:

Here x is a typed variable, c is a typed constant, A is an array variable, i an integer variable or constant, + and · respectively represent addition and multiplication over scalars (reals or integers), and the Boolean operators are as usual. We assume an orthogonal type system that ensures that all expressions and assignments in our programs are well-typed. The set of variables appearing in P is denoted by Var(P) = {x₁, …, x_n}.

As for semantics, for simplicity, let us restrict our focus to programs that terminate on all inputs. Let Val be a universe of typed values. A state of P is a vector σ Valⁿ. Intuitively, for all 1 ≤ i ≤ n, σ(i) is the value of the variable x_i at state σ. The set of all states of P is denoted by σ(P).

The semantics of the program P, an arithmetic expression e occurring in P, and a Boolean expression b in P are now respectively given by maps P : Σ(P) → Σ(P), e : Σ(P) → Val, and b : Σ(P) → {true, false}. Intuitively, for each state Σ of P, e (σ) and b (σ) are respectively the values of e and b at σ, and P (σ) is the state at which P terminates after starting execution from σ. We omit the inductive definitions of these maps as they are standard.

Our definition of continuity of programs is an adaptation of the traditional ε-δ definition of continuous functions. As a program can have multiple inputs and outputs, we define continuity with respect to an input variable x_i and an output variable x_j. Intuitively, if P is continuous with respect to input x_i and output x_j, then an arbitrarily small change to the initial value of any x_i, while keeping the remaining variables fixed, must only cause an arbitrarily small change to the final value of x_j. Variables other than x_j are allowed to change arbitrarily.

Formally, consider states σ, σ’ of P and any > 0. Let x_i be a variable of type τ, and let d_τ denote the metric over type τ. We say that σ and σ’ are ε-close with respect to x_i, and write σ ≈_,i, if d_τ(σ(i),σ'(i))< . We call σ’ an ε-perturbation of σ with respect to x_i, and write σ≡_,iσ’, if σ and σ’ are ε-close with respect to x_i, and further, for all j ≠ i, we have σ(j) = σ'(j). Now we define:

DEFINITION 1 (CONTINUITY). A program P is continuous at a state σ with respect to an input variable x_i and an output variable x_j if for all > 0, there exists a δ > 0 such that for all σ’ Σ(P),

An issue with continuity is that it only certifies program behavior under arbitrarily small perturbations to the inputs. However, we may often want a definition of continuity that establishes a quantitative relationship between changes to a program’s inputs and outputs. Of the many properties in function analysis that accomplish this, Lipschitz continuity is perhaps the most well known. Let K > 0. A function f: → is K-Lipschitz continuous, or simply K-Lipschitz, if a ± -change to x can change f(x) by at most ±K . The constant K is known as the Lipschitz constant of f. It is easy to see that if f is K-Lipschitz for some K, then f is continuous at every input x.

We generalize this definition in two ways while adapting it to programs. First, as for continuity, we define Lipschitz continuity with respect to an input variable x_i and an output variable x_j. Second, we allow Lipschitz constants that depend on the size of the input. For example, suppose x_i is an array of length N, and consider -changes to it that do not change its size. We consider P to be N-Lipschitz with respect to x_i if on such a change, the output x_j can change at most by N. . In general, a Lipschitz constant of P is a function K: → _≥0 that takes the size of x_i as an input.

Formally, to each value v, let us associate a size v > 0. If v is an integer or real, then v = 1; if v is an array of length N, then v = N. We have:

DEFINITION 2 (LIPSCHITZ CONTINUITY). Let K: → _≥0. The program P is K-Lipschitz with respect to an input x_i and an output x_j if for all σ, σ’ Σ(P) and > 0,

where m = K( σ(i) ). .

More generally, we could define Lipschitz continuity of P within a subset Σ’ of its state space. In such a definition, σ and σ’ in Definition 2 are constrained to be in Σ’, and no assertion is made about the effect of perturbations on states outside Σ’. Such a definition is useful because many realistic programs are Lipschitz only within certain regions of their input space, but for brevity, we do not consider it here.

Now we note that many of the classic computations in computer science are in fact continuous or Lipschitz.

EXAMPLE 1 (SORTING). Let Sort be a sorting program that takes in an array A of reals and returns a sorted permutation of A. As discussed in Section 1, Sort is 1-Lipschitz in input A and output A, because any -change to the initial value of A (defined using our metric on arrays) can produce at most an -change to its final value. Note that this means that Sort is continuous in input A and output A at every program state.

What if A was an array of integers? Continuity still holds in this case, but for more trivial reasons. Since A is of a discrete type, the only arbitrarily small perturbation that can be made to A is no perturbation at all. Obviously, the program output does not change in this case. However, reasoning about continuity turns out to be important even under these terms. This is apparent when we try to prove that Sort is 1-Lipschitz when A is an array of integers. The easiest way to do this is to “cast” the input type of the program into an array of reals and prove that the program is 1-Lipschitz even after this modification, and this demands a proof of continuity.

EXAMPLE 2 (SHORTEST PATHS). Let SP be a correct implementation of a shortest path algorithm. We view the graph G on which SP operates as an array of reals such that G[i] is the weight of the i-th edge. An -change to G thus amounts to a maximum change of ± to any edge weight of G, while keeping the node and edge structure intact.

The output of SP is the array d of shortest path distances in G—that is, d[i] is the length of the shortest path from the source node to the i-th node u_i of G. We note that SP is N-Lipschitz in input G and output d (N is the number of edges in G). This is because if each edge weight in G changes by an amount ε, a shortest path weight can change at most by N · .

On the other hand, suppose SP had a second output: an array π whose i-th element is a sequence of nodes forming a minimal-weight path between src and u_i. An ε-change to G may add or subtract elements from π—that is, perturb π by the amount ∞. Therefore, SP is not K-Lipschitz with respect to the output π for any K.

EXAMPLE 3 (MINIMUM SPANNING TREES). Consider any algorithm MST for computing a minimum-cost spanning tree in a graph G with real edge weights. Suppose MST has a variable c that holds, on termination, the cost of the minimum spanning tree. MST is N-Lipschitz (hence continuous everywhere) in the input G and the output c.

EXAMPLE 4 (KNAPSACK). Consider the Knapsack problem from combinatorial optimization. We have a set of items {1, …, N}, each item i associated with a real cost c[i] and a real value v[i]. We also have a nonnegative, real-valued budget. The goal is to identify a subset Used ⊆ {1, …, N} such that the constraint Σ_{j

Used} c[i] ≤ Budget is satisfied, and the value of tot_v = Σ_{j

Used} v[i] is maximized.

Let our output variable be tot_v. As a perturbation can turn a previously feasible solution infeasible, a program Knap solving this problem is discontinuous with respect to the input c and also with respect to the input Budget. At the same time, Knap is N-Lipschitz with respect to the input v: if the value of each item changes by ± , the value of tot_v can change by ±N .

Continuity and Lipschitz continuity can be used as definitions of robustness, a property that ensures that a program behaves predictably on uncertain inputs. Uncertainty, in this context, can be modeled by either nondeterminism (the value of x has a margin of error ±ε) or a probability distribution. For example, a robust embedded controller does not change its control decisions abruptly because of uncertainty in its sensor-derived inputs. The statistics computed by a robust scientific program are not much affected by measurement errors in the input dataset.

Robustness has benefits even in contexts where a program’s relationship with uncertainty is not adversarial. The input space of a robust program does not have isolated “peaks”—that is, points where the program output is very different from outputs on close-by inputs. Therefore, we are more likely to cover a program’s behaviors using random tests if the program is robust. Also, robust computations are more amenable to randomized and approximate program transformations²⁰ that explore trade-offs between a program’s quality of results and resource consumption. Transformations of this sort can be seen to deliberately introduce uncertainty into a program’s operational semantics. If the program is robust, then this extra uncertainty does not significantly affect its observable behavior, hence such a transformation is “safer” to perform. More details are available in our conference paper on robustness analysis.³

Now, if a program is Lipschitz, then we can give quantitative upper bounds on the change to its behavior due to uncertainty in its inputs, and further, this bound is small if the inputs are only slightly uncertain. Consequently, Lipschitz continuity is a rather strong robustness property. Continuity is a weaker definition of robustness—a program computing e^x is continuous, even though it hugely amplifies errors in its inputs. Nonetheless, it captures freedom from a common class of robustness violations: those where uncertainty in the inputs alters a program’s control flow, and this change leads to a significant change in the program’s output.

3. Program Verification

Now we present our automated framework for proving a program continuous² or Lipschitz.³ Our analysis is sound—that is, a program proven continuous or Lipschitz by our analysis is indeed continuous. However, as the analysis targets a Turing-complete language, it is incomplete—for example, a program may be continuous even if the analysis does not deem it so.

3.1. Verifying continuity

First we show how to verify the continuity of an IMP program. We use as running examples three of the most well-known algorithms of computing: Bubble Sort, the BellmanFord shortest path algorithm, and Dijkstra’s shortest path algorithm (Figure 2). As mentioned in Example 1, a program is always continuous in input variables of discrete types. Therefore, to make the problem more interesting, we assume that the input to Bubble Sort is an array of reals. As before, we model graphs by arrays of reals where each item represents the weight of an edge.

Given a program P, our task is to derive a syntactic continuity judgment for P, defined as a term b Cont(P, In, Out), where b is a Boolean formula over Var, and In and Out are sets of variables of P. Such a judgment is read as “For each x_i In and x_j Out and each state σ where b is true, P is continuous in input x_i and output x_j at σ.” We break down this task into the task of deriving judgments b Cont(P’, In, Out) for programs P’ that are syntactic substructures of P. For example, if P is of the form “if b then P₁ else P₂,” then we recursively derive continuity judgments for P₁ and P₂.

Continuity judgments are derived using a set of syntactic proof rules—the rules can be converted in a standard way into an automated program analysis that iteratively assigns continuity judgments to subprograms. Figure 1 shows the most important of our rules; for the full set, see the original reference.² To understand the syntax of the rules, consider the rule BASE. This rule derives a conclusion b Cont(P, In, Out), where b, In, and Out are arbitrary, from the premise that P is either “skip” or an assignment.

The rule SEQUENCE addresses sequential composition of programs, generalizing the fact that the composition of two continuous functions is continuous. One of the premises of this rule is a Hoare triple of the form {b₁}P{b₂}. This is to be read as “For any state σ that satisfies b₁, P (σ) satisfies b₂. (A standard program verifier can be used to verify this premise.) The rule IN-OUT allows us to restrict or generalize the set of input and output variables with respect to which a continuity judgment is made.

The next rule—ITE—handles conditional statements, and is perhaps the most interesting of our rules. In a conditional branch, a small perturbation to the input variables can cause control to flow along a different branch, leading to a syntactically divergent behavior. For instance, this happens in Lines 34 in the Bubble Sort algorithm in Figure 2—perturbations to items in A can lead to either behaviors of either “swapping A[i] and A[i + 1]” or “leaving A unchanged.”

The core idea behind the rule ITE is to show that such a divergence does not really matter, because at the program states where arbitrarily small perturbations to the program variables can “flip” the value of the guard b of the conditional statement (let us call the set of such states the boundary of b), the branches of the conditional are arbitrarily close in behavior.

Precisely, let us construct from b the following formula:

Note that (b) represents an overapproximation of the boundary of b. Also, for a set of output variables Out and a Boolean formula c, let us call programs P₁ and P₂ Out-equivalent under c, and write c (P₁ ≡_Out P₂), if for each state σ that satisfies c, the states P₁ (σ) and P₂ (σ) agree on the values of all variables in Out. We assume an oracle that can determine if c (P₁ ≡_Out P₂) for given c, P₁, P₂, and Out. In practice, such equivalence questions can often be solved fully automatically using modern automatic theorem provers.¹⁹ Now, to derive a continuity judgment for a program “if b then P₁ else P₂” with respect to the outputs Out, ITE shows that P₁ and P₂ become Out-equivalent under the condition (b).

The rule LOOP derives continuity judgments for while-loops. The idea here is to prove the body R of the loop continuous, then inductively argue that the entire loop is continuous too. In more detail, the rule derives a continuity judgment c Cont(R, X, X), where c is a loop invariant—a property that is always true at the loop header—and X is a set of variables. Now consider any state σ satisfying c. An arbitrarily small perturbation to this state leads to an arbitrarily small change in the value of each variable in X at the end of the first iteration, which only leads to an arbitrarily small change in the value of each variable in X at the end of the second iteration, and so on. Continuity follows.

Some subtleties need to be considered, however. An execution from a slightly perturbed state may terminate earlier or later than it would in the original execution. Even if the loop body is continuous, the extra iterations in either the modified or the original execution may cause the states at the loop exit to be very different. We rule out such scenarios by asserting a premise called synchronized termination. A loop “while b do R” fulfills this property with respect to a loop invariant c and a set of variables X, if (b) c R ≡_x skip). Under this property, even if the loop reaches a state where a small perturbation can cause the loop to terminate earlier (similarly, later), the extra iterations in the original execution have no effect on the program state. We can ignore these iterations in our proof.

Second, even if the loop body is continuous in input x_i and output x_j for every x_i, x_j X, an iteration may drastically change the value of program variables not in X. If there is a data flow from these variables to some variable in X, continuity will not hold. We rule out this scenario through an extra condition. Consider executions of P whose initial states satisfy a condition c. We call a set of variables X of P separable under c if the value of each z X on termination of any such execution is independent of the initial values of variables not in X. We denote the fact that X is separable in this way by c Sep(P, X).

To verify that P is continuous in input x_i and output x_j at state σ, we derive a judgment b Cont(P, {x_i}, {X_j}), where b is true at σ. The correctness of the method follows from the following soundness theorem:

THEOREM 1. If the rules in Figure 1 can derive the judgment b Cont(P, In, Out), then for all x_i In, x_j Out, and σ such that b = true, P is continuous in input x_i and output x_j at σ.

EXAMPLE 5 (WARMUP). Consider the program “if (x > 2) then x:= x/2 else x:= −5x + 11.” (x > 2) equals (x = 2) and (x = 2) (x:= x/2) ≡_{x} (x:= −5x + 11). By ITE, the program is continuous in input x and output x.

Let us now use our rules on the algorithms in Figure 2.

EXAMPLE 6 (BUBBLE SORT). Consider the implementation of Bubble Sort in Figure 2. (We assume it to be rewritten as a while–program in the obvious way.) Our goal is to derive the judgment true Cont(BubbleSort, {A}, {A}).

Let X = {A, i, j}, and let us write R_<p,q> to denote the code fragment from line p to line q (both inclusive). Also, let us write c Term(while b do R, X) as an abbreviation for (b) c (R ≡_x skip).

It is easy to show that true Sep(BubbleSort, X) and true Sep(R_<2,4>, X). Each loop guard only involves discrete variables, hence we derive true Term(BubbleSort, X) and true Term(R_<2,4>, X).

Now consider R_<3,4>. As (A[i] > A[i + 1]) equals (A[i] = A[i + 1]) and (A[i] = A[i + 1]) (skip ≡_X R_<4,4>), we have true Cont(R_<3,4>, X, X), then true Cont(R_<2,4>, X, X), and then true Cont(BubbleSort, X, X). Now the IN-OUT rule derives the judgment we are after.

EXAMPLE 7 (BELLMAN-FORD). Take the BellmanFord algorithm. On termination, d[u] contains the shortest path distance from the source node src to the node u. We want to prove that true Cont(BellmanFord, {G}, {d}).

We use the symbols R_<p,q> and Term as before. Clearly, we have true Sep(R_<3,5>, X) and true Term(R_<3,5>, X), where X = {G, v, w}. The two branches of the conditional in Line 4 are X-equivalent at (d[v] + G(v, w) < d[w]), hence we have true = Cont(R_<4,5>, X, X), and from this judgment, true Cont(R_<3,5>, X, X). Similar arguments can be made about the outer loop. Now we can derive the fact true Cont(BellmanFord, X, X); weakening, we get the judgment we seek.

Unfortunately, the rule Loop is not always enough for continuity proofs. Consider states σ and σ’ of a continuous program P, where σ’ is obtained by slightly perturbing σ. For LOOP to apply, executions from σ and σ’ must converge to close-by states at the end of each loop iteration. However, this need not be so. For example, think of Dijkstra’s algorithm. As a shortest path computation, this program is continuous in the input graph G and the output d—the array of shortest path distances. But let us look at its main loop in Figure 2.

Note that in any iteration, there may be several items w for which d[w] is minimal. But then, a slightly perturbed initial value of d may cause a loop iteration to choose a different w, leading to a drastic change in the value of d at the end of the iteration. Thus, individual iterations of this loop are not continuous, and we cannot apply LOOP.

In prior work,² we gave a more powerful rule, called epoch induction, for proving the continuity of programs like the one above. The key insight here is that if we group some loop iterations together, then continuity becomes an inductive property of the groupings. For example, in Dijkstra’s algorithm, a “grouping” is a maximal set S of successive loop iterations that are tied on the initial value of d[w]. Let σ₀ be the program state before the first iteration in S is executed. Owing to arbitrarily small perturbations to σ₀, we may execute iterations in S in a very different order. However, an iteration that ran after the iterations in S in the original execution will still run after the iterations in S. Moreover, for a fixed σ₀, the program state, once all iterations in S have been executed, is the same, no matter what order these iterations were executed in. Thus, a small perturbation cannot significantly change the state at the end of S, and the set of iterations S forms a continuous computation.

We have implemented the rules in Figure 1, as well as the epoch induction rule, in the form of a mostly-automatic program analysis. Given a program, the analysis iterates through its control points, assigning continuity judgments to subprograms until convergence is reached. Auxiliary tasks such as checking the equivalence of two straight-line program fragments (needed by rule ITE) are performed automatically using the Z3⁸ SMT-solver. Human intervention is expected in two forms. First, in applications of the epoch induction rule, we sometimes expect the programmer to write annotations that define appropriate “groupings” of iterations. Second, in case a complex loop invariant is needed for the proof (e.g., when one of the programs in a program equivalence query is a nested loop), the programmer is expected to supply it. There are heuristics and auxiliary tools that can be used to automate these steps, but our current system does not employ them.

Given the incompleteness of our proof rules, a natural empirical question for us was whether our system can verify the continuity of the continuous computing tasks described in Section 2. To answer this question, we chose several 13 continuous algorithms (including algorithms) over real and real array data types. Our system was able to verify the continuity of 11 of these algorithms, including the shortest path algorithms of Bellman-Ford, Dijkstra, and Floyd-Warshall; Merge Sort and Selection Sort in addition to Bubble Sort; and the minimum spanning tree algorithms of Prim and Kruskal. Among the algorithms we could not verify were Quick Sort. Please see Chaudhuri et al.² for more details.

3.2. Verifying Lipschitz continuity

Now we extend the above verification framework to one for Lipschitz continuity. Let us fix variables x_i and x_j of the program P respectively as the input and the output variable. To start with, we assume that x_i and x_j are of continuous data types—reals or arrays of reals.

Let us define a control flow path of a program P as the sequence of assignment or skip-statements that P executes on some input (we omit a more formal definition). We note that since our arithmetic expressions are built from additions and multiplications, each control flow path of P encodes a continuous—in fact differentiable—function of the inputs. Now suppose we can show that each control flow path of P is a K-Lipschitz computation, for some K, in input x_i and output x_j. This does not mean that P is K-Lipschitz in this input and output: a perturbation to the initial value of x_i can cause P to execute along a different control flow path, leading to a drastically different final state. However, if P is continuous and the above condition holds, then P is K-Lipschitz in input x_i and output x_j.

Our analysis exploits the above observation. To prove that P is K-Lipschitz in input x_i and output x_j, we establish that (1) P is continuous at all states in input x_i and output x_j and (2) each control flow path of P is K-Lipschitz in input x_i and output x_j. Of these, the first task is accomplished using the analysis from Section 3.1. To accomplish the second task, we compute a data structure—a set of Lipschitz matrices—that contains upper bounds on the slopes of any computation that can be carried out in a control flow path of P.

More precisely, let P have n variables named x₁, …, x_n, as before. A Lipschitz matrix J of P is an n × n matrix, each of whose elements is a function K: → _≥0. Elements of J are represented either as numeric constants or as symbolic expressions (for example, N + 5), and the element in the i-th row and j-th column of J is denoted by J(i, j). Our analysis associates P with sets of such matrices via judgments P: . Such a judgment is read as follows: “For each control flow path C in P and each x_i, x_j, there is a J such that C is J(j, i)-Lipschitz in input x_i and output x_j.”

The Lipschitz matrix data structure can be seen as a generalization of the Jacobian from vector calculus. Recall that the Jacobian of a function f: ⁿ → ⁿ with inputs x₁, …, x’_n and outputs x’₁, …, x’_n is the matrix whose (i, j)-th entry is . If f is differentiable, then for each x’_i and x_j, f is K-Lipschitz with respect to input x_j and output x’_i, where K is any upper bound on . In our setting, each control flow path represents a differentiable function, and we can verify the Lipschitz continuity of this function by propagating a Jacobian along the path. On the other hand, owing to branches, the program P may not represent a differentiable, or even continuous, function.

However, note that it is correct to associate a conditional statement “if b then P₁ else P₂” with the set of matrices ( ₁ ₂), where the judgments P₁: ₁ and P₂: ₂ have been made inductively. Of course, this increases the number of matrices that we have to track for a subprogram. But the proliferation of such matrices can be curtailed using an approximation that merges two or more of them.

This merge operation is defined as (J₁ J₂)(i,j) = max(J₁(i,j), J₂(i,j)) for all J₁, J₂, i,j. Suppose we can correctly derive the judgment P: . Then for any J₁, J₂ , it is also correct to derive the judgment P: ( \{J₁,J₂} {J₁ J₂}). Note that this overapproximation may overestimate the Lipschitz constants for some of the control flow paths in P, but this is acceptable as we are not seeking the most precise Lipschitz constant for P anyway.

Figure 3 shows our rules for associating a set of Lipschitz matrices with a program P. In the first rule SKIP, I is the identity matrix. The rule is correct because skip is 1-Lipschitz in input x_i and output x_i for all i, and 0-Lipschitz in input x_i and output x_j, where i ≠ j.

To obtain Lipschitz constants for assignments to variables (rule ASSIGN), we must quantify the way the value of an arithmetic expression e changes when its inputs are changed. This is done by computing a vector ∇_e whose i-th element is an upper bound on . in more detail, we have

Assignments x_i[m]:= e to array locations are slightly trickier. The location x_i[m] is affected by changes to variables appearing in e; if the variable x_i does not appear in e, then perturbations to the initial value of x_i have no effect on x_i[m]. However, the remaining locations in x_i are affected by, and only by, changes to the initial value of x_i. Thus, we can view x_i as being split into two “regions”—one consisting of x_i[m] and the other of every other location—with possibly different Lipschitz constants. We track these constants using two different Lipschitz matrices J and J’. Here J is as in the rule ASSIGN, while J’ is identical to the Lipschitz matrix for a hypothetical assignment x_i := x_i.

Sequential composition is handled by matrix multiplication (rule SEQUENCE)—the insight here is essentially the chain rule of differentiation. As mentioned earlier, the rule for conditional statements merges the Lipschitz matrices computed along the two branches. The WEAKEN rule allows us to overestimate a Lipschitz constant at any point.

The rule WHILE derives Lipschitz matrices for while-loops. Here Bound⁺ (P, M) is a premise that states that the symbolic or numeric constant M is an upper bound on the number of iterations of P—it is assumed to be inferred via an auxiliary checker.¹¹ Finally, ^M is shorthand for the singleton set of matrix products {J₁, … J_M : J_i }. In cases where M is a symbolic bound, we will not be able to compute this product explicitly. However, in many practical cases, one can reduce it to a simpler manageable form using algebraic identities.

The WHILE rule combines the rules for if-statements and sequential composition. Consider a loop P whose body R has Lipschitz matrix J. If the loop terminates in exactly M iterations, J^M is a correct Lipschitz matrix for it. However, if the loop may terminate after M’ < M iterations, we require an extra property for J^M to be a correct Lipschitz matrix: Jⁱ ≤ Jⁱ⁺¹ for all i < M. This property is ensured by the condition i, j: J(i, j) = 0 J(i, j) ≥ 1. Note that in the course of a proof, we can weaken any Lipschitz matrix for the loop body to a matrix J of this form.

We can prove the following soundness theorem:

THEOREM 2. Let P be continuous in input x_i and output x_j. If the rules in Figure 3 derive the judgment P: {J}, then P is J(j, i)-Lipschitz in input x_i and output x_j.

EXAMPLE 8 (WARMUP). Recall the program “if (x > 2) then x:= x/2 else x:= −5x + 11″ from Example 5 (x is a real). Our rules can associate the left branch with a single Lipschitz matrix containing a single entry 1/2, and the right branch with a single matrix containing a single entry 5. Given the continuity of the program, we conclude that the program is 5-Lipschitz in input x and output x.

EXAMPLE 9 (BUBBLE SORT). Consider the Bubble Sort algorithm (Figure 2) once again, and as before, let R_<p,q> denote the code fragment from line p to line q. Let us set x₀ to be A and x₁ to be t.

Now, let . From the rules in Figure 3, we can derive (t:= A[i]): {J}, (A[i]:= A[i + 1]): {I}, and (A[i + 1]:= t): {J, I}.

Carrying out the requisite matrix multiplications, we get R_<4,4>: {J}. Using the rule ITE, we have R_<3,4>: {I, J}. Now, it is easy to show that R_<3,4> gets executed N times, where N is the size of A. From this we have R_<2,4>: {I, J}^N. Given that J² = IJ = JI = J, this is equivalent to the judgment R_<2,4>: {I, J}. From this, we derive BubbleSort: {J, I}. Given the proof of continuity carried out in Example 1, Bubble Sort is 1-Lipschitz in input A and output A.

Intuitively, the reason why Bubble Sort is so robust is that here, (1) there is no data flow from program points where arithmetic operations are carried out to points where values are assigned to the output variable and (2) continuity holds everywhere. In fact, one can formally prove that any program that meets the above two criteria is 1-Lipschitz. However, we do not develop this argument here.

EXAMPLE 10 (BELLMANFORD; DIJKSTRA). Let us consider the BellmanFord algorithm (Figure 2) once again, and let x₀ be G and x₁ be d. Consider line 5 (i.e., the program R_<5,5>); our rules can assign to this program the Lipschitz matrix J, where . With a few more derivations, we obtain R_<4,5>: {J}. Using the rule for loops, we have R_<3,5>: {J^N}, where N is the number of edges in G, and then BellmanFord: { }. But note that

Combing the above with the continuity proof in Example 7, we decide that the BellmanFord algorithm is N²–Lipschitz in input G and output d.

Note that the Lipschitz constant obtained in the above proof is not the optimal one—that would be N. This is an instance of the gap between truth and provability that is the norm in program analysis. Interestingly, our rules can derive the optimal Lipschitz constant for Dijkstra’s algorithm. Using the same reasoning as above, we assign to the main loop of the algorithm the single Lipschitz matrix J. Applying the LOOP rule, we derive

Given that the algorithm is continuous in input G and output d, it is N-Lipschitz in input G and output d.

Let us now briefly consider the case when the input and output variables in our program are of discrete type. As a program is trivially continuous in every discrete input, continuity is not a meaningful notion in such a setting. Therefore, we focus on the problem of verifying Lipschitz continuity—for example, showing that the Bubble Sort algorithm is 1-Lipschitz even when the input array A is an array of integers.

An easy solution to this problem is to cast the array A into an array A* of reals, and then to prove 1-Lipschitz continuity of the resultant program in input A* and output A*. As any integer is also a real, the above implies that the original algorithm is 1-Lipschitz in input A and output A. Thus, reals are used here as an abstraction of integers, just as (unbounded) integers are often used in program verification as abstractions of bounded-length machine numbers.

Unsurprisingly, this strategy does not always work. Consider the program “if (x > 0) then x:= x + 1 else skip,” where x is an integer. This program is 2-Lipschitz. Its “slope” is the highest around initial states where x = 0: if the initial value of x changes from 0 to 1, the final value of x changes from 0 to 2. At the same time, if we cast x into a real, the resultant program is discontinuous and thus not K-Lipschitz for any K.

It is possible to give an analysis of Lipschitz continuity that does not suffer from the above issue. This analysis casts the integers into reals as mentioned above, then calculates a Lipschitz matrix of the resultant program; however, it checks a property that is slightly weaker than continuity. For lack of space, we do not go into the details of the analysis here.

We have extended our implementation of continuity analysis with the verification method for Lipschitz continuity presented above, and applied the resulting system to the suite of 13 algorithms mentioned at the end of Section 3.1. All these algorithms were either 1-Lipschitz or N-Lipschitz. Our system was able to compute the optimal Lipschitz constant for 9 of the 11 algorithms where continuity could be verified. In one case (Bellman-Ford), it certified an N-Lipschitz computation as N²-Lipschitz. The one example on which it fared poorly was the Floyd-Warshall shortest path algorithm, where the best Lipschitz constant that it could compute was exponential in N³.

4. Related Work

So far as we know, we were the first² to propose a framework for continuity analysis of programs. Before us, Hamlet¹² advocated notions of continuity of software; however, he concluded that “it is not possible in practice to mechanically test for continuity” in the presence of loops. Soon after our first paper on this topic (and before our subsequent work on Lipschitz continuity of programs), Reed and Pierce¹⁸ gave a type system that can verify the Lipschitz continuity of functional programs. This system can seamlessly handle functional data structures such as lists and maps; however, unlike our method, it cannot reason about discontinuous control flow, and would consider any program with a conditional branch to have a Lipschitz constant of ∞.

More recently, Jha and Raskhodnikova have taken a property testing approach to estimating the Lipschitz constant of a program. Given a program, this method determines, with a level of probabilistic certainty, whether it is either 1-Lipschitz or -far (defined in a suitable way) from being 1-Lipschitz. While the class of programs allowed by the method is significantly more restricted than what is investigated here or by Reed and Pierce ¹³, the appeal of the method lies in its crisp completeness guarantees, and also in that it only requires blackbox access to the program.

Robustness is a standard correctness property in control theory,^{16, 17} and there is an entire subfield of control studying the design and analysis of robust controllers. However, the systems studied by this literature are abstractly defined using differential equations and hybrid automata rather than programs. The systematic modeling and analysis of robustness of programs was first proposed by us in the context of general software, and by Majumdar and Saha¹⁴ in the context of control software.

In addition, there are many efforts in the abstract interpretation literature that, while not verifying continuity or robustness explicitly, reason about the uncertainty in a program’s behavior due to floating-point rounding and sensor errors.^{6, 7, 10} Other related literature includes work on automatic differentiation (AD),¹ where the goal is to transform a program P into a program that returns the derivative of P where it exists. Unlike the work described here, AD does not attempt verification—no attempt is made to certify a program as differentiable or Lipschitz.

5. Conclusion

In this paper, we have argued for the adoption of analytical properties like continuity and Lipschitz continuity as correctness properties of programs. These properties are relevant as they can serve as useful definitions of robustness of programs to uncertainty. Also, they raise some fascinating technical issues. Perhaps counterintuitively, some of the classic algorithms of computer science satisfy continuity or Lipschitz continuity, and the problem of systematic reasoning about these properties demands a nontrivial combination of analytical and logical insights.

We believe that the work described here is a first step toward an extension of the classical calculus to a symbolic mathematics where programs form a first-class representation of functions and dynamical systems. From a practical perspective, this is important as physical systems are increasingly controlled by software, and as even applied mathematicians increasingly reason about functions that are not written in the mathematical notation of textbooks, but as code. Speaking more philosophically, the classical calculus focuses on the computational aspects of real analysis, and the notation of calculus texts has evolved primarily to facilitate symbolic computation by hand. However, in our era, most mathematical computations are carried out by computers, and a calculus for our age should not ignore the notation that computers can process most easily: programs. This statement has serious implications—it opens the door not only to the study of continuity or derivatives but also to, say, Fourier transforms, differential equations, and mathematical optimization of code. Some efforts in these directions^{4, 5, 9} are already under way; others will no doubt appear in the years to come.

Acknowledgments

This research was supported by NSF CAREER Award #1156059 (“Robustness Analysis of Uncertain Programs: Theory, Algorithms, and Tools”).

Figures

Figure 1. Key rules in continuity analysis.

Figure 2. Bubble sort, the Bellman-Ford algorithm, and Dijkstra’s algorithm.

Figure 3. Rules for deriving Lipschitz matrices.

Submit an Article to CACM

CACM welcomes unsolicited submissions on topics of relevance and value to the computing community.

You Just Read

Continuity and Robustness of Programs

View in the ACM Digital Library

Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and full citation on the first page. Copyright for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, to republish, to post on servers, or to redistribute to lists, requires prior specific permission and/or fee. Request permission to publish from permissions@acm.org or fax (212) 869-0481.

DOI

10.1145/2240236.2240262

August 2012 Issue

Published: August 1, 2012

Vol. 55 No. 8

Pages: 107-115

Table of Contents

Join the Discussion (0)

Become a Member or Sign In to Post a Comment

The Latest from CACM

Explore More

BLOG@CACM Jul 26 2024

Establishing Standards for Embodied AI

Shaoshan Liu

Architecture and Hardware

vitruvian man on green binary code background, illustration

BLOG@CACM Jul 24 2024

A Pioneer in Using AI to Teach Reading

Jeremy Roschelle

Architecture and Hardware

BLOG@CACM Jul 23 2024

A Versal Story in the Era of Hardware AI: Why the Chinese Could Win

Aleksandr Romanov and Maksim Popov

Architecture and Hardware

worker amidst rows of circuit boards at Chinese factory

Shape the Future of Computing

ACM encourages its members to take a direct hand in shaping the future of the association. There are more ways than ever to get involved.

Get Involved

Communications of the ACM (CACM) is now a fully Open Access publication.

By opening CACM to the world, we hope to increase engagement among the broader computer science community and encourage non-members to discover the rich resources ACM has to offer.

Learn More

1. Introduction

2. Continuity, Lipschitz Continuity, and Robustness

3. Program Verification

4. Related Work

5. Conclusion

Figures

Continuity and Robustness of Programs

DOI

August 2012 Issue

Related Reading

Join the Discussion (0)

Become a Member or Sign In to Post a Comment

Shape the Future of Computing

Communications of the ACM (CACM) is now a fully Open Access publication.