Home/Magazine Archive/February 2015 (Vol. 58, No. 2)/Verifying Computations Without Reexecuting Them/Full Text

Review articles
## Verifying Computations Without Reexecuting Them

In this setup, a single reliable PC can monitor the operation of a herd of supercomputers working with possibly extremely powerful but unreliable software and untested hardware.

—Babai, Fortnow, Levin, Szegedy, 1991^{4}

How *can* a single PC check a herd of supercomputers with unreliable software and untested hardware?

This classic problem is particularly relevant today, as much computation is now outsourced: it is performed by machines that are rented, remote, or both. For example, service providers (SPs) now offer storage, computation, managed desktops, and more. As a result, relatively weak devices (phones, tablets, laptops, and PCs) can run computations (storage, image processing, data analysis, video encoding, and so on) on banks of machines controlled by someone else.

This arrangement is known as cloud computing, and its promise is enormous. A lone graduate student with an intensive analysis of genome data can now rent a hundred computers for 12 hours for less than $200. And many companies now run their core computing tasks (websites, application logic, storage) on machines owned by SPs, which automatically replicate applications to meet demand. Without cloud computing, these examples would require buying hundreds of physical machines when demand spikes ... and then selling them back the next day.

But with this promise comes risk. SPs are complex and large-scale, making it unlikely that execution is always correct. Moreover, SPs do not necessarily have strong incentives to ensure correctness. Finally, SPs are black boxes, so faults—which can include misconfigurations, corruption of data in storage or transit, hardware problems, malicious operation, and more^{33}—are unlikely to be detectable. This raises a central question, which goes beyond cloud computing: *How can we ever trust results computed by a third-party, or the integrity of data stored by such a party?*

A common answer is to replicate computations.^{15,16,34} However, replication assumes that failures are uncorrelated, which may not be a valid assumption: the hardware and software platforms in cloud computing are often homogeneous. Another answer is auditing—checking the responses in a small sample—but this assumes that incorrect outputs, if they occur, are relatively frequent. Still other solutions involve trusted hardware^{39} or attestation,^{37} but these mechanisms require a chain of trust and assumptions that the hardware or a hypervisor works correctly.

But what if the third party returned its results along with a proof that the results were computed correctly? And what if the proof were inexpensive to check, compared to the cost of redoing the computation? Then few assumptions would be needed about the kinds of faults that can occur: either the proof would check, or not. We call this vision *proof-based verifiable computation*, and the question now becomes: *Can this vision be realized for a wide class of computations?*

Deep results in complexity theory and cryptography tell us that in principle the answer is "yes." Probabilistic proof systems^{24,44}—which include interactive proofs (IPs),^{3,26,32} probabilistically checkable proofs (PCPs),^{1,2,44} and argument systems^{13} (PCPs coupled with cryptographic commitments^{30})—consist of two parties: a *verifier* and a *prover*. In these protocols, the prover can efficiently convince the verifier of a mathematical assertion. In fact, the acclaimed PCP theorem,^{1,2} together with refinements, ^{27} implies that a verifier only has to check three randomly chosen bits in a suitably encoded proof!

Meanwhile, the claim "*this* program, when executed on *this* input, produces *that* output" can be represented as a mathematical assertion of the necessary form. The only requirement is the verifier knows the program, the input (or at least a digest, or fingerprint, of the input), and the purported output. And this requirement is met in many uses of outsourced computing; examples include Map Reduce-style text processing, scientific computing and simulations, database queries, and Web request-response. ^{a} Indeed, although the modern significance of PCPs lies elsewhere, an original motivation was verifying the correctness of remotely executed computations: the paper quoted in our epigraph^{4} was one of the seminal works that led to the PCP theorem.

However, for decades these approaches to verifiable computation were purely theoretical. Interactive protocols were prohibitive (exponential-time) for the prover and did not appear to save the verifier work. The proofs arising from the PCP theorem (despite asymptotic improvements^{10,20}) were so long and complicated that it would have taken thousands of years to generate and check them, and would have needed more storage bits than there are atoms in the universe.

But beginning around 2007, a number of theoretical works achieved results that were especially relevant to the problem of verifying cloud computations. Goldwasser et al., in their influential Muggles paper,^{25} refocused the theory community's attention on verifying outsourced computations, in the context of an interactive proof system that required only *polynomial* work from the prover, and that applied to computations expressed as certain kinds of circuits. Ishai et al.^{29} proposed a novel cryptographic commitment to an entire linear function, and used this primitive to apply *simple* PCP constructions to verifying general-purpose outsourced computations. A couple of years later, Gentry's breakthrough protocol for fully homomorphic encryption (FHE)^{23} led to work (GGP) on *non-interactive* protocols for general-purpose computations.^{17,21} These developments were exciting, but, as with the earlier work, implementations were thought to be out of the question. So the theory continued to remain theory—until recently.

The last few years have seen a number of projects overturn the conventional wisdom about the hopeless impracticality of proof-based verifiable computation. These projects aim squarely at building real systems based on the theory mentioned earlier, specifically PCPs and Muggles (FHE-based protocols still seem too expensive). The improvements over naive theoretical protocols are dramatic; it is not uncommon to read about factor-of-a-trillion speedups. The projects take different approaches, but broadly speaking, they apply both refinements of the theory and systems techniques. Some projects include a full pipeline: a programmer specifies a computation in a high-level language, and then a compiler (a) transforms the computation to the formalism that the verification machinery uses and (b) outputs executables that implement the verifier and prover. As a result, achieving verifiability is no harder for the programmer than writing the code in the first place.

The goal of this article is to survey this blossoming area of research. This is an exciting time for work on verifiable computation: while none of the works we discuss here is practical enough for its inventors to raise venture capital for a startup, they merit being referred to as "systems." Moreover, many of the open problems cut across subdisciplines of computer science: programming languages, parallel computing, systems, complexity theory, and cryptography. The pace of progress has been rapid, and we believe real applications of these techniques will appear in the next few years.

*A note about scope.* We focus on solutions that provide integrity to, and in principle are very efficient for, the verifier.^{b} Thus, we do not cover exciting work on efficient implementations of secure multiparty protocols.^{28,31} We also exclude FHE-based approaches based on GGP^{21} (as noted earlier, these techniques seem too expensive) and the vast body of domain-specific solutions (surveyed elsewhere^{36,42,47}).

*The problem statement and some observations about it.* A *verifier* sends the specification of a computation *p* (for example, the text of a program) and input *x* to a *prover*. The prover computes an output *y* and returns it to the verifier. If *y* = *p*(*x*), then a correct prover should be able to convince the verifier of *y*'s correctness, either by answering some questions or by providing a certificate of correctness. Otherwise, the verifier should reject *y* with high probability.

In any protocol that solves this problem, we desire three things. First, the protocol should provide some advantage to the verifier: either the protocol should be cheaper for the verifier than executing *p*(*x*) locally, or else the protocol should handle computations *p* that the verifier could not execute itself (for example, operations on state private to the prover). Second, we do not want to make any assumptions that the prover follows the protocol. Third, *p* should be general; later, we will have to make some compromises, but for now, *p* should be seen as encompassing all C programs whose running time can be statically bounded given the input size.

Some reflections about this setup are in order. To begin with, we are willing to accept some overhead for the prover, as we expect assurance to have a price. Something else to note is that whereas some approaches to computer security attempt to reason about what *incorrect* behavior looks like (think of spam detection, for instance), we will specify *correct* behavior and ensure anything other than this behavior is visible as such; this frees us from having to enumerate, or reason about, the possible failures of the prover.

Finally, one might wonder: How does our problem statement relate to NP-complete problems, which are easy to check but believed to be hard to solve? The answer is that the "check" of an NP solution requires the checking entity to do work polynomial in the length of the solution, whereas our verifier will do far less work than that! (Randomness makes this possible.) Another answer is that many computations (for example, those that run in deterministic polynomial time) do not admit an asymmetric checking structure—unless one invokes the fascinating body of theory that we turn to now.

*A framework for solving the problem in theory.* A framework for solving the problem in theory is depicted in Figure 1. Because *Boolean circuits* (networks of AND, OR, NOT gates) work naturally with the verification machinery, the first step is for the verifier and prover to transform the computation to such a circuit. This transformation is possible because any of our computations *p* is naturally modeled by a Turing Machine (TM); meanwhile, a TM can be "unrolled" into a Boolean circuit that is not much larger than the number of steps in the computation.

Thus, from now on, we will talk only about the circuit *C* that represents our computation *p* (Figure 1, step 1). Consistent with the problem statement earlier, the verifier supplies the input *x*, and the prover executes the circuit *C* on input *x* and claims the output is *y*.^{c} In performing this step, the prover is expected to obtain a *valid* transcript for {*C*, *x*, *y*}(Figure 1, step 2). A *transcript* is an assignment of values to the circuit wires; in a valid transcript for {*C*, *x*, *y*}, the values assigned to the input wires are those of *x*, the intermediate values correspond to the correct operation of each gate in C, and the values assigned to the output wires are *y*. Notice that if the claimed output is incorrect—that is, if *y* ≠ *p*(*x*)—then a valid transcript for {*C*, *x*, *y*} simply does not exist.

Therefore, if the prover could establish a valid transcript exists for {*C*, *x*, *y*}, this would convince the verifier of the correctness of the execution. Of course, there is a simple proof that a valid transcript exists: the transcript itself. However, the verifier can check the transcript only by examining all of it, which would be as much work as having executed *p* in the first place.

Instead, the prover will *encode* the transcript (Figure 1, step 3) into a longer string. The encoding lets the verifier detect a transcript's validity by inspecting a small number of randomly chosen locations in the encoded string and then applying efficient tests to the contents found therein. The machinery of PCPs, for example, allows exactly this (see the three accompanying sidebars).

However, we still have a problem. The verifier cannot get its hands on the entire encoded transcript; it is longer—astronomically longer, in some cases—than the plain transcript, so reading in the whole thing would again require too much work from the verifier. Furthermore, we do not want the prover to have to write out the whole encoded transcript: that would also be too much work, much of it wasteful, since the verifier looks at only small pieces of the encoding. And unfortunately, we cannot have the verifier simply ask the prover what the encoding holds at particular locations, as the protocols depend on the element of surprise. That is, if the verifier's queries are known in advance, then the prover can arrange its answers to fool the verifier.

This is an exciting time for work on verifiable computation.

As a result, the verifier must issue its queries about the encoding carefully (Figure 1, step 4). The literature describes three separate techniques for this purpose. They draw on a richly varied set of tools from complexity theory and cryptography, and are summarized next. Afterward, we discuss their relative merits.

*Use the power of interaction.*One set of protocols proceeds in rounds: the verifier queries the prover about the contents of the encoding at a particular location, the prover responds, the verifier makes another query, the prover responds, and so on. Just as a lawyer's questions to a witness restrict the answers the witness can give to the next question, until a lying witness is caught in a contradiction, the prover's answers in each round about what the encoding holds limit the space of valid answers in the next round. This continues until the last round, at which point a prover that has answered perfidiously at any point—by answering based on an invalid transcript or by giving answers that are untethered to any transcript—simply has no valid answers. This approach relies on interactive proof protocols,^{3,26,32}most notably Muggles,^{25}which was refined and implemented.^{18,45–47}*Extract a commitment.*These protocols proceed in two rounds. The verifier first requires the prover to*commit*to the full contents of the encoded transcript; the commitment relies on standard cryptographic primitives, and we call the commited-to contents a*proof*. In the second round, the verifier generates queries—locations in the proof the verifier is interested in—and then asks the prover what values the proof contains at those locations; the prover is forced to respond consistent with the commitment. To generate queries and validate answers, the verifier uses PCPs (they enable probabilistic checking, as described in the sidebar entitled "Probabilistically Checkable Proofs"). This approach was outlined in theory by Kilian,^{30}building on the PCP theorem.^{1,2}Later, Ishai et al.^{29}(IKO) gave a drastic simplification, in which the prover can commit to a proof without materializing the whole thing. IKO led to a series of refinements, and implementation in a system.^{40–43,47}*Hide the queries.*Instead of extracting a commitment and then revealing its queries, the verifier pre-encrypts its queries—as with the prior technique, the queries describe locations where the verifier wants to inspect an eventual proof, and these locations are chosen by PCP machinery—and sends this description to the prover prior to their interaction. Then, during the verification phase, powerful cryptography achieves the following: the prover answers the queries without being able to tell which locations in the proof are being queried, and the verifier recovers the prover's answers. The verifier then uses PCP machinery to check the answers, as in the commitment-based protocols. The approach was described in theory by Gennaro et al.^{22}(see also Bitansky et al.^{12}), and refined and implemented in two projects.^{7,9,36}

The three techniques described are elegant and powerful, but as with the prior technique, naive implementations would result in preposterous costs. The research projects that implemented these techniques have applied theoretical innovations and serious systems work to achieve *near* practical performance. Here, we explain the structure of the design space, survey the various efforts, and explore their performance (in doing this, we will illustrate what "near practical" means).

We restrict our attention to *implemented systems* with published experimental results. By "system," we mean code (preferably publically released) that takes some kind of representation of a computation and produces executables for the verifier and the prover that run on stock hardware. Ideally, this code is a compiler toolchain, and the representation is a program in a high-level language.

**The landscape.** As depicted in Figure 2, we organize the design space in terms of a three-way trade-off among cost, expressiveness, and functionality. ^{d} Here, cost mainly refers to *setup costs* for the verifier; as we will see, this cost is the verifier's largest expense, and affects whether a system meets the goal of saving the verifier work. (This setup cost also correlates with the prover's cost for most of the systems discussed.) By expressiveness, we mean the class of computations the system can handle while providing a benefit to the verifier. By functionality, we mean whether the works provide properties like non-interactivity (setup costs amortize indefinitely), *zero knowledge*^{24,26} (the computation transcript is hidden from the verifier, giving the prover some privacy), and public verifiability (anyone, not just a particular verifier, can check a proof, provided that the party who generated the queries is trusted).

**CMT, Allspice, and Thaler.** One line of work uses "the power of interaction;" it starts from Muggles,^{25} the interactive proof protocol mentioned earlier. CMT^{18,46} exploits an algebraic insight to save orders of magnitude for the prover, versus a naive implementation of Muggles.

For circuits to which CMT applies, performance is very good, in part because Muggles and CMT do not use cryptographic operations. In fact, refinements by Thaler^{45} provide a prover that is optimal for certain classes of computations: the costs are only a constant factor (10–100×, depending on choice of baseline) over executing the computation locally. Moreover, CMT applies in (and was originally designed for) a streaming model, in which the verifier processes and discards input as it comes in.

However, CMT's expressiveness is limited. First, it imposes requirements on the circuit's geometry: the circuit must have structurally similar parallel blocks. Of course, not all computations can be expressed in that form. Second, the computation cannot use order comparisons (less-than, and so on).

Allspice^{47} has CMT's low costs but achieves greater expressiveness (under the amortization model described next).

**Pepper, Ginger, and Zaatar.** Another line of work builds on the "extract a commitment" technique (called an "efficient argument" in the theory literature.^{13,30}). Pepper^{42} and Ginger^{43} refine the protocol of IKO. To begin with, they represent computations as arithmetic constraints (that is, a set of equations over a finite field); a solution to the constraints corresponds to a valid transcript of the computation. This representation is often far more concise than Boolean circuits (used by IKO and in the proof of the PCP theorem^{1}) or arithmetic circuits (used by CMT). Pepper and Ginger also strengthen IKO's commitment primitive, explore low-overhead PCP encodings for certain computations, and apply a number of systems techniques (such as parallelization on distributed systems).

Pepper and Ginger dramatically reduce costs for the verifier and prover, compared to IKO. However, as in IKO, the verifier incurs setup costs. Both systems address this issue via amortization, reusing the setup work over a *batch*: multiple *instances* of the same computation, on different inputs, verified simultaneously.

Pepper requires describing constraints manually. Ginger has a compiler that targets a larger class of computations; also, the constraints can have auxiliary variables set by the prover, allowing for efficient representation of not-equal-to checks and order comparisons. Still, both handle only straight line computations with repeated structure, and both require special-purpose PCP encodings.

Zaatar^{41} composes the commitment protocol of Pepper and Ginger with a new linear PCP;^{1,29} this PCP adapts an ingenious algebraic encoding of computations from GGPR^{22} (which we return to shortly). The PCP applies to all pure computations; as a result, Zaatar achieves Ginger's performance but with far greater generality.

**Pinocchio.** Pinocchio^{36} instantiates the technique of hiding the queries. Pinocchio is an implementation of GGPR, which is a *noninteractive* argument. GGPR can be viewed as a probabilistically checkable encoding of computations that is akin to a PCP (this is the piece that Zaatar adapts) plus a layer of sophisticated cryptography.^{12,41} GGPR's encoding is substantially more concise than prior approaches, yielding major reductions in overhead.

The cryptography also provides many benefits. It hides the queries, which allows them to be reused. The result is a protocol with minimal interaction (after a per-computation setup phase, the verifier sends only an instance's input to the prover) and, thus, qualitatively better amortization behavior. Specifically, Pinocchio amortizes per-computation setup costs over all future instances of a given computation; by contrast, recall that Zaatar and Allspice amortize their per-computation costs only over a batch. GGPR's and Pinocchio's cryptography also yield zero knowledge and public verifiability.

Compared to Zaatar, Pinocchio brings some additional expense in the prover's costs and the verifier's setup costs. Pinocchio's compiler initiated the use of C syntax in this area, and includes some program constructs not present in prior work. The underlying computational model (unrolled executions) is essentially the same as Ginger's and Zaatar's.^{41,43}

Although the systems we have described so far have made tremendous progress, they have done so within a programming model that is not reflective of real-world computations. First, these systems require loop bounds to be known at compile time. Second, they do not support indirect memory references scalably and efficiently, ruling out RAM and thus general-purpose programming. Third, the verifier must handle all inputs and outputs, a limitation that is at odds with common uses of the cloud. For example, it is unreasonable to insist that the verifier materialize the entire (massive) input to a MapReduce job. The projects described next address these issues.

**TinyRAM (BCGTV and BCTV).** BCGTV^{7} compiles programs in C (not just a subset) to an innovative circuit representation.^{6} Applying prior insights, ^{12,22,41} BCGTV combines this circuit with proof machinery (including transcript encoding and queries) from Pinocchio and GGPR.

BCGTV's circuit representation consists of the unrolled execution of a general-purpose MIPS-like CPU, called TinyRAM (and for convenience we sometimes use "TinyRAM" to refer to BCGTV and its successors). The circuit-as-unrolled-processor provides a natural representation for language features like data-dependent looping, control flow, and self-modifying code. BCGTV's circuit also includes a permutation network that elegantly and efficiently enforces the correctness of RAM operations.

BCTV^{9} improves on BCGTV by retrieving program instructions from RAM (instead of hard-coding them in the circuit). As a result, all executions with the same number of steps use the same circuits, yielding the best amortization behavior in the literature: setup costs amortize over all computations of a given length. BCTV also includes an optimized implementation of Pinocchio's protocol that cuts costs by constant factors.

Despite these advantages, the general approach brings a steep price: BCTV's circuits are orders of magnitude larger than Pinocchio's and Zaatar's for the same high-level programs. As a result, the verifier's setup work and the prover's costs are orders of magnitude higher, and BCTV is restricted to very short executions. Nevertheless, BCGTV and BCTV introduce important tools.

**Pantry and Buffet.** Pantry^{14} extends the computational model of Zaatar and Pinocchio, and works with both systems. Pantry provides a general-purpose approach to state, yielding a RAM abstraction, verifiable Map Reduce, verifiable queries on remote databases, and—using Pinocchio's zero knowledge variant—computations that keep the prover's state private. To date, Pantry is the only system to extensively use the capability of argument protocols (the "extract a commitment" and "hide the queries" techniques) to handle computations for which the verifier does not have all the input. In Pantry's approach—which instantiates folklore techniques—the verifier's explicit input includes a digest of the full input or state, and the prover is obliged to work over state that matches this digest.

Under Pantry, every operation against state compiles into the evaluation of a cryptographic hash function. As a result, a memory access is tens of thousands of times more costly than a basic arithmetic operation.

Buffet^{48} combines the best features of Pantry and TinyRAM. It slashes the cost of memory operations in Pantry by adapting TinyRAM's RAM abstraction. Buffet also brings data-dependent looping and control flow to Pantry (without TinyRAM's expense), using a loop flattening technique inspired by the compilers literature. As a result, Buffet supports an expansive subset of C (disallowing only function pointers and goto) at costs orders of magnitude lower than both Pantry and TinyRAM. As of this writing, Buffet appears to achieve the best mix of performance and generality in the literature.

**A brief look at performance.** We will answer three questions:

*How do the verifier's variable (per-instance) costs compare to the baseline of local, native execution?*For some computations, this baseline is an alternative to verifiable outsourcing.*What are the verifier's setup costs, and how do they amortize?*In many of the systems, setup costs are significant and are paid for only over multiple instances of the same circuit.*What is the prover's overhead?*

We focus only on CPU costs. On the one hand, this focus is conservative: verifiable outsourcing is motivated by more than CPU savings for the verifier. For example, if inputs are large or inaccessible, verifiable outsourcing saves network costs (the naive alternative is to download the inputs and locally execute); in this case, the CPU cost of local execution is irrelevant. On the other hand, CPU costs provide a good sense of the overall expense of the protocols. (For evaluations that take additional resources into account, see Braun et al.^{14})

The data we present is from re-implementations of the various systems by members of our research group, and essentially match the published results. All experiments are run on the same hardware (Intel Xeon E5-2680 processor, 2.7Ghz, 32GB RAM), with the prover on one machine and the verifier on another. We perform three runs per experiment; experimental variation is minor, so we just report the average. Our benchmarks are 128 × 128 matrix multiplication (of 64-bit quantities, with full precision arithmetic) and PAM clustering of 20 vectors, each of dimension 128. We do not include data for Pantry and Buffet since their innovations do not apply to these benchmarks (their performance would be the same as Zaatar or Pinocchio, depending on which machinery they were configured to run with). For TinyRAM, we report extrapolated results since, as noted earlier, TinyRAM on current hardware is restricted to executions much smaller than these benchmarks.

Figure 3 depicts per-instance verification costs, for matrix multiplication, compared to two baselines. The first is a native execution of the standard algorithm, implemented with floating-point operations; it costs 3.5ms, and beats all of the systems at the given input size.^{e} (At larger input sizes, the verifier would do better than native execution: the verifier's costs grow linearly in the input size, which is only *O*(*m*^{2}); local execution is *O*(*m*^{3}).) The second is an implementation of the algorithm using a multi-precision library; this baseline models a situation in which complete precision is required.

We evaluate setup costs by asking about the *cross-over point*: how many instances of a computation are required to amortize the setup cost in the sense the verifier spends fewer CPU cycles on outsourcing versus executing locally? Figure 4 plots total cost lines and cross-over points versus the second baseline.

To evaluate prover overhead, Figure 5 normalizes the prover's cost to the floating-point baseline.

**Summary and discussion.** Performance differences among the systems are overshadowed by the general nature of costs in this area. The verifier is practical if its computation is amenable to one of the less expensive (but more restricted) protocols, or if there are a large number of instances that will be run (on different inputs). And when state is remote, the verifier does not need to be faster than local computation because it would be difficult—or impossible, if the remote state is private—for the verifier to perform the computation itself (such applications are evaluated elsewhere^{14}).

The prover, of course, has terrible overhead: several orders of magnitude (though as noted previously, this still represents tremendous progress versus the prior costs). The prover's practicality thus depends on your ability to construct appropriate scenarios. Maybe you're sending Will Smith and Jeff Goldblum into space to save Earth; then you care a lot more about correctness than costs (a calculus that applies to ordinary satellites, too). More prosaically, there is a scenario with an abundance of server CPU cycles, many instances of the same computation to verify, and remotely stored inputs: data-parallel cloud computing. Verifiable MapReduce^{14} is therefore an encouraging application.

The main issue in this area is performance, and the biggest problem is the prover's overhead. The verifier's per-instance costs are also too high. And the verifier's setup costs would ideally be controlled while retaining expressivity. (This is theoretically possible,^{8,11} but overhead is very high: in a recent implementation,^{8} the prover's computational costs are orders of magnitude higher than in TinyRAM.)

The computational model is a critical area of focus. Can we identify or develop programming languages that are expressive yet compile efficiently to the circuit or constraint formalism? More generally, can we move beyond this intrinsically costly formalism?

There are also questions in systems. For example, can we develop a realistic database application, including concurrency, relational structures, and so on? More generally, an important test for this area—so far unmet—is to run experiments at realistic scale.

Another interesting area of investigation concerns privacy. By leveraging Pinocchio, Pantry has experimented with simple applications that hide the prover's state from the verifier, but there is more work to be done here and other notions of privacy that are worth providing. For example, we can provide verifiability while concealing the program that is executed (by composing techniques from Pantry, Pinocchio, and TinyRAM). A speculative application is to produce versions of Bitcoin in which transactions can be conducted anonymously, in contrast to the status quo.^{5,19}

It is worth recalling that the intellectual foundations of this research area really had nothing to do with practice. For example, the PCP theorem is a landmark achievement of complexity theory, but if we were to implement the theory as proposed, generating the proofs, even for simple computations, would have taken longer than the age of the universe. In contrast, the projects described in this article have not only built systems from this theory but also performed experimental evaluations that terminate before publication deadlines.

So that is the encouraging news. The sobering news, of course, is these systems are basically toys. Part of the reason we are willing to label them near practical is painful experience with what the theory *used* to cost. (As a rough analogy, imagine a graduate student's delight in discovering hexadecimal machine code after years spent programming one-tape Turing machines.)

Still, these systems are arguably useful in some scenarios. In high-assurance contexts, we might be willing to pay a lot to know that a remotely deployed machine is executing correctly. In the streaming context, the verifier may not have space to compute locally, so we could use CMT^{18} to check the outputs are correct, in concert with Thaler's refinements^{45} to make the prover truly inexpensive. Finally, data-parallel cloud computations (like MapReduce jobs) perfectly match the regimes in which the general-purpose schemes perform well: abundant CPU cycles for the prover and many instances of the same computation with different inputs.

Moreover, the gap separating the performance of the current research prototypes and plausible deployment in the cloud is a few orders of magnitude—which is certainly daunting, but, given the current pace of improvement, might be bridged in a few years.

More speculatively, if the machinery becomes truly low overhead, the effects will go far beyond verifying cloud computations: we will have new ways of building systems. In any situation in which one module performs a task for another, the delegating module will be able to check the answers. This could apply at the micro level (if the CPU could check the results of the GPU, this would expose hardware errors) and the macro level (distributed systems could be built under very different trust assumptions).

But even if none of this comes to pass, there are exciting intellectual currents here. Across computer systems, we are starting to see a new style of work: reducing sophisticated cryptography and other achievements of theoretical computer science to practice.^{28,35,38,49} These developments are likely a product of our times: the preoccupation with strong security of various kinds, and the computers powerful enough to run previously "paper-only" algorithms. Whatever the cause, proof-based verifiable computation is an excellent example of this tendency: not only does it compose theoretical refinements with systems techniques, it also raises research questions in other sub-disciplines of computer science. This cross-pollination is the best news of all.

**Acknowledgments.** We thank Srinath Setty, Justin Thaler, Riad Wahby, Alexis Gallagher, the anonymous *Communications* reviewers, Boaz Barak, William Blumberg, Oded Goldreich, Yuval Ishai and Guy Rothblum.

1. Arora, S., Lund, C., Motwani, R., Sudan, M. and Szegedy, M. Proof verification and the hardness of approximation problems. *JACM 45*, 3 (May 1998), 501–555, (Prelim. version FOCS 1992).

2. Arora, S. and Safra, S. Probabilistic checking of proofs: A new characterization of NP. *JACM 45*, 1 (Jan. 1998), 70–122. (Prelim. version FOCS 1992).

3. Babai, L. Trading group theory for randomness. In *Proceedings of STOC*, 1985.

4. Babai, L., Fortnow, L., Levin, A. and Szegedy, M. Checking computations in polylogarithmic time. In *Proceedings of STOC*, 1991.

5. Ben-Sasson, E. et al. Decentralized anonymous payments from Bitcoin. *IEEE Symposium on Security and Privacy*, 2014.

6. Ben-Sasson, E., Chiesa, A., Genkin, D. and Tromer, E. Fast reductions from RAMs to delegatable succinct constraint satisfaction problems. In *Proceedings of ITCS*, Jan. 2013.

7. Ben-Sasson, E., Chiesa, A., Genkin, D. and Tromer, E. SNARKs for C: Verifying program executions succinctly and in zero knowledge. In *Proceedings of CRYPTO*, Aug. 2013.

8. Ben-Sasson, E., Chiesa, A., Tromer, E. and Virza, M. Scalable zero knowledge via cycles of elliptic curves. In *Proceedings of CRYPTO*, Aug. 2014.

9. Ben-Sasson, E., Chiesa, A., Tromer, E. and Virza, M. Succinct non-interactive zero knowledge for a von Neumann architecture. *USENIX Security*, (Aug. 2014).

10. Ben-Sasson, E., Goldreich, O., Harsha, P., Sudan, M. and S. Vadhan, S. Robust PCPs of proximity, shorter PCPs and applications to coding. *SIAM J. on Comp. 36*, 4 (Dec. 2006), 889–974.

11. Bitansky, N., Canetti, R., Chiesa, A. and Tromer, E. Recursive composition and bootstrapping for SNARKs and proof-carrying data. In *Proceedings of STOC*, June 2013.

12. Bitansky, N., Chiesa, A., Ishai, Y., Ostrovsky, R. and Paneth, O. Succinct non-interactive arguments via linear interactive proofs. In *Proceedings of IACR TCC*, Mar. 2013.

13. Brassard, G., Chaum, D. and Crépeau, C. Minimum disclosure proofs of knowledge. *J. Comp. and Sys. Sciences 37*, 2 (Oct. 1988), 156–189.

14. Braun, B., Feldman, A.J., Ren, Z., Setty, S., Blumberg, A.J., and Walfish, M. Verifying computations with state. In *Proceedings of SOSP*, Nov. 2013.

15. Canetti, R., Riva, B. and Rothblum, G. Practical delegation of computation using multiple servers. *ACM CCS*, 2011.

16. Castro, M. and Liskov, B. Practical Byzantine fault tolerance and proactive recovery. *ACM Trans. on Comp. Sys. 20*, 4 (Nov. 2002), 398–461.

17. Chung, K.M., Kalai, Y. and Vadhan, S. Improved delegation of computation using fully homomorphic encryption. In *Proceedings of CRYPTO*, 2010.

18. Cormode, G., Mitzenmacher, M. and Thaler, J. Practical verified computation with streaming interactive proofs. In *Proceedings of ITCS*, 2012.

19. Danezis, G., Fournet, C., Kohlweiss, M. and Parno, B. Pinocchio coin: Building zerocoin from a succinct pairing-based proof system. In *Proceedings of the Workshop on Language Support for Privacy-enhancing Technologies*, Nov. 2013.

20. Dinur. I. The PCP theorem by gap amplification. *JACM 54*, 3 (June 2007), 2:1–12:44.

21. Gennaro, R., Gentry, C. and Parno, B. Non-interactive verifiable computing: Outsourcing computation to untrusted workers. In *Proceedings of CRYPTO*, 2010.

22. Gennaro, R., Gentry, C. and Parno, B. and Raykova, M. Quadratic span programs and succinct NIZKs without PCPs. In *Proceedings of EUROCRYPT*, May 2013.

23. Gentry, C. A fully homomorphic encryption scheme. PhD thesis, Stanford University, 2009.

24. Goldreich, O. Probabilistic proof systems—A primer. *Foundations and Trends in Theoretical Computer Science 3*, 1 (2007), 1–91.

25. Goldwasser, S., Kalai, Y.T. and Rothblum, G.N. Delegating computation: Interactive proofs for muggles. In *Proceedings of STOC*, May 2008.

26. Goldwasser, S., Micali, S. and Rackoff, C. The knowledge complexity of interactive proof systems. *SIAM J. on Comp. 18*, 1 (1989), 186–208.

27. Håstad, J. Some optimal inapproximability results. *JACM 48*, 4 (July 2001), 798–859. (Prelim. version STOC 1997).

28. Huang, Y., Evans, D., Katz, J. and Malka, L. Faster secure two-party computation using garbled circuits. In *USENIX Security*, 2011.

29. Ishai, Y., Kushilevitz, E., and Ostrovsky, R. Efficient arguments without short PCPs. In *Proceedings of the Conference on Computational Complexity* (CCC), 2007.

30. Kilian, J. A note on efficient zero-knowledge proofs and arguments (extended abstract). In *Proceedings of STOC*, 1992.

31. Kreuter, B., shelat, a. and Shen, C.H. Billion-gate secure computation with malicious adversaries. *USENIX Security* (Aug. 2012).

32. Lund, C., Fortnow, L., Karloff, H.J., and Nisan, N. Algebraic methods for interactive proof systems. *JACM 39*, 4 (1992), 859–868.

33. Mahajan, P. et al. Depot: Cloud storage with minimal trust. *ACM Trans. on Comp. Sys. 29*, 4 (Dec. 2011).

34. Malkhi, D. and Reiter, M. Byzantine quorum systems. *Distributed Computing 11*, 4 (Oct. 1998), 203–213. (Prelim. version *Proceedings of STOC* 1997).

35. Narayan, A. and Haeberlen, A. DJoin: Differentially private join queries over distributed databases. In *Proceedings of OSDI*, 2012.

36. Parno, B., Gentry, C., Howell, J. and Raykova, M. Pinocchio: Nearly practical verifiable computation. *IEEE Symposium on Security and Privacy*, (May 2013).

37. Parno, B., McCune, J.M. and Perrig, A. *Bootstrapping Trust in Modern Computers.* Springer, 2011.

38. Popa, R.A., Redfield, C.M.S., Zeldovich, N. and Balakrishnan, H. CryptDB: Protecting confidentiality with encrypted query processing. In *Proceedings of SOSP*, 2011.

39. Sadeghi, A.R., Schneider, T., and Winandy, M. Token-based cloud computing: Secure outsourcing of data and arbitrary computations with lower latency. In *Proceedings of TRUST*, 2010.

40. Setty, S., Blumberg, A.J. and Walfish, M. Toward practical and unconditional verification of remote computations. In *Proceedings of HotOS*, May 2011.

41. Setty, S., Braun, B., Vu, V., Blumberg, A.J., Parno, B. and Walfish, M. Resolving the conflict between generality and plausibility in verified computation. In *Proceedings of EuroSys*, Apr. 2013.

42. Setty, S., McPherson, R., Blumberg, A.J., and Walfish, M. Making argument systems for outsourced computation practical (sometimes). In *Proceedings of NDSS*, 2012.

43. Setty, S., Vu, V., Panpalia, N., Braun, B., Blumberg, A.J. and Walfish, M. Taking proof-based verified computation a few steps closer to practicality. *USENIX Security*, Aug. 2012.

44. Sudan, M. Probabilistically checkable proofs. *Commun. ACM 52*, 3 (Mar. 2009), 76–84.

45. Thaler, J. Time-optimal interactive proofs for circuit evaluation. In *Proceedings of CRYPTO*, Aug. 2013.

46. Thaler, J., Roberts, M., Mitzenmacher, M. and Pfister, H. Verifiable computation with massively parallel interactive proofs. In *Proceedings of USENIX HotCloud Workshop*, (June 2012).

47. Vu, V., Setty, S., Blumberg, A.J. and Walfish, M. A hybrid architecture for interactive verifiable computation. *IEEE Symposium on Security and Privacy*, (May 2013).

48. Wahby, R.S., Setty, S., Ren, Z., Blumberg, A.J. and Walfish, M. Efficient RAM and control flow in verifiable outsourced computation. In *Proceedings of NDSS*, Feb. 2015.

49. Wolinsky, D.I., Corrigan-Gibbs, H., Ford, B. and Johnson, A. Dissent in numbers: Making strong anonymity scale. In *Proceedings of OSDI*, 2012.

a. The condition does not hold for "proprietary" computations whose logic is concealed from the verifier. However, the theory can be adapted to this case too, as we discuss near the end of the article.

b. Some of the systems (those known as *zero knowledge SNARKs*) also keep the prover's input private.

c. The framework also handles circuits where the prover supplies some of the inputs and receives some of the outputs (enabling computations over remote state inaccessible to the verifier). However, the accompanying techniques are mostly beyond our scope (we will briefly mention them later). For simplicity we are treating *p* as a pure computation.

d. For space, we omit recent work that is optimized for specific classes of computations; these works can be found with this article in the ACM Digital Library under Supplemental Material.

e. Systems that report verification costs beating local execution choose very expensive baselines for local computation.^{36,41–43}

Figure 1. A framework for solving the problem in theory.

Figure 2. Design space of implemented systems for proof-based verifiable computation.

Figure 3. Per-instance verification costs applied to 128 × 128 matrix multiplication of 64-bit numbers.

Figure 4. Total costs and cross-over points (extrapolated), for 128 × 128 matrix multiplication.

Figure 5. Prover overhead normalized to native execution cost for two computations. Prover overheads are generally enormous.

The Digital Library is published by the Association for Computing Machinery. Copyright © 2015 ACM, Inc.