Sign In

Communications of the ACM

Research highlights

Hiding Secrets in Software: A Cryptographic Approach to Program Obfuscation


View as: Print Mobile App ACM Digital Library Full Text (PDF) In the Digital Edition Share: Send by email Share on reddit Share on StumbleUpon Share on Hacker News Share on Tweeter Share on Facebook
Hiding Secrets in Software, illustration

Credit: iStockPhoto.com

Can we hide secrets in software? Can we obfuscate programsthat is, make programs unintelligible while preserving their functionality? What exactly do we mean by "unintelligible"? Why would we even want to do this?

In this article, we describe some rigorous cryptographic answers to these quasi-philosophical questions. We also discuss our recent "candidate indistinguishability obfuscation" scheme and its implications.

Back to Top

1. Introduction

Informally, the goal of program obfuscation is deceptively easy to state: to make a program unintelligible while preserving its functionality. By "preserving its functionality," we just mean that the obfuscated program should be fully executable and have the same input-output behavior as the original program. Defining "unintelligible" is more tricky.

* 1.1. Making sense of the "unintelligible"

Of course, most program code is already "unintelligible" in the colloquial sense of the word. "Indeed, any programmer knows that total unintelligibility is the natural state of computer programs (and one must work hard in order to keep a program from deteriorating into this state)."1

But suppose we want a stronger notion of unintelligibility, one that applies not only to sleep-deprived programmers but also to automated "deobfuscators" and determined, resourceful adversaries. Previous work on obfuscation has focused on making code resistant/unintelligible to particular attacks, such as known techniques for static and dynamic analysis.6 However, it is a recurring theme in security engineering that resistance to one set of attacks is no guarantee that a system cannot be broken by other means. And so, many of these obfuscation schemes have been followed by corresponding schemes for deobfuscation.6

Modern cryptography, especially following the work of Goldwasser and Micali,18 has offered a rigorous and expansive approach to defining security, offering the best way to escape the break-and-repair cycle of security engineering: Define security (in our case, the word "unintelligible") in a mathematically rigorous way, and then prove that if there exists any adversary that can efficiently break security (according to the definition), then there is an efficient algorithm to solve a "hard" mathematical problem, such as factoring large integers. This approach makes a computational assumptionthat the underlying mathematical problem is hard (i.e., cannot be solved in time polynomial in a security parameter), and that the adversary is computationally bounded to run in polynomial time. Under this computational assumption the scheme is provably secure. Crucially, this approach makes no other assumption or guess about what type of attack the adversary might deploy to break security. Of course, the security definition might not completely capture the adversary's access to the system: for example, early definitions of security for encryption assumed the secret key could be stored and used securely, while "side channel" attacks have challenged this assumption and led to refined definitions. And the "hard" problem might turn out not to be as hard as initially thought: for example, algorithms for factoring large integers have improved immensely since the invention of RSA. But history has shown that the rigorous cryptographic approach to security is the best way to filter out schemes that break in "uninteresting" ways.

So, how does cryptography rigorously define the unintelligibility of obfuscation? The natural conceptual way to view the unintelligibility of obfuscation is as a black box that allows oracle access to a program (you give it inputs and get back outputs) but which completely hides the program's internal workings. Hada20 and Barak et al.1 were able to make this definition of obfuscation mathematically rigorous, but then Barak et al. showed that even a weaker notion called virtual black box (VBB) obfuscation is impossible to achieve: some programs are VBB-unobfuscatable. Even stronger negative results were obtained for other models.17 This profound negative result, and our inability to find a clear line between obfuscatable and unobfuscatable programs, essentially crushed cryptographic research on obfuscation for over a decade.

Barak et al.1 also presented a second way of viewing program obfuscation: as program pseudo-canonicalization. The idea is that if programs P0 and P1 have the same input-output behavior, and I obfuscate one of them, it should be hard for an adversary to tell which program I obfuscated. Thus, the obfuscation "canonicalizes" functionally equivalent programs by making them hard to tell apart. This notion is called indistinguishability obfuscation (iO). Importantly, there is no proof suggesting that iO is impossible. Nevertheless, this notion received much less attention, since it seemed too weak to be useful. This view changed somewhat when Goldwasser and Rothblum proved a strong relative guaranteenamely, that an indistinguishability obfuscator is in some sense a best-possible obfuscator: it is as good as any other obfuscator of roughly the same complexity.19 But still, in the absence of intuitive security guarantees and without a plausibly secure construction of iO for general programs, there was not much interest in seeing what could be built from iO.

* 1.2. Indistinguishability obfuscation: a candidate construction

This article reviews the first plausibly secure construction of iO for general programs.11 Since this proposal in 2013, interest in iO has exploded, and researchers have found that iO is actually very powerful. Surprisingly, it enables many of the "pie-in-the-sky" applications that motivated research on VBB obfuscation.

Our construction has two parts. First, we show how any method for iO that works for "simple programs"in particular, for log-depth Boolean circuits, denoted by the complexity class NC1can be used to get iO for general programs (modeled as Boolean circuits). Second, we give our "core obfuscator," namely our iO construction for NC1 circuits. The bootstrapping transformation from NC1 circuits to general programs is "cryptographically kosher," in that we rigorously prove that our iO construction for general programs is secure assuming that the iO construction for NC1 circuits is secure, and assuming cryptographic tools that by now have become widely accepted, such as cryptographic proofs with certain properties and fully homomorphic encryption (FHE)13, 14, 23 with a decryption algorithm in NC1.

The second part, our core iO construction for NC1 circuits, is less cryptographically kosher, and this is the reason we call our iO construction a "candidate." While the security of our NC1 construction is based on a well-defined mathematical assumption, the assumption is highly non-standard and complicated, having none of the simplicity and elegance of standard cryptographic assumptions like the hardness of factoring large integers. Moreover, our assumption is stated in terms of cryptographic multilinear maps,10 for which constructions have been proposed only recently, and which themselves rely on their own non-standard assumptions. Cryptographers do not intend to tolerate this situation very long, and currently are working furiously and simultaneously on trying to break these assumptions and to base iO on more natural assumptions.

Another caveat is that our iO construction is quite impractical (though we can expect substantial performance improvements in the future). Our construction is a proof of concept that has helped to transform our understanding of obfuscation, not something that should be deployed any time soon.

How does our approach to general-purpose program obfuscation differ from previous approaches? While many diverse approaches have appeared in the literature, a common thread connecting most previous approaches was the idea of obfuscation by "hiding a needle in a haystack"that is, by hiding the actually important instructions and conditions within the original program among many extraneous commands and tests.6 Unfortunately, computers are excellent at finding needles in haystacks, and these approaches, while natural, do not work very well.

Our approach is very different. At a high level, our core obfuscator creates a globally randomized representation of the program. Intuitively, this globally randomized representation of the program takes each individual instruction from the original program and converts them into highly unnatural randomized instructions. These instructions appear random and useless to an attacker. In particular, each instruction has a random-looking effect on the entire obfuscated program's state. However, the global randomization that produces these instructions is designed in such a way that the randomization "cancels out" only and exactly when the entire computation is finished. That is, once every one of the weird instructions have been carried out precisely in the specified order, the randomization within these instructions is eliminated and only the true output of the original computation remains. Thus, in our approach, there is neither a haystack nor a needle: every instruction in the obfuscated version of the program is essential; each one moves the computation forward. However, the randomized mathematical representation of the program ensures that until the very end of the computation, every step looks random and uninteresting.

More specifically, our core obfuscator for simple programs combines cryptographic multilinear maps,7, 10 a tool somewhat similar to FHE, with variants of tools introduced by Barrington2 and Kilian21 that allow NC1 circuits to be efficiently re-expressed as matrix branching programs, which can be randomized in a strong sense. Then, we bootstrap iO from simple programs to complicated programs by encrypting the entire complicated program using FHE, and then conditionally decrypting the result using a simple program that has been obfuscated using our core obfuscator.

* 1.3. Hiding secrets in software

Why is iO interesting? Although iO is less versatile and trickier to use than VBB obfuscation, it is powerful enough to let us hide secrets in software (in some cases).

First, we should mention that there are limits to what any obfuscation can accomplish. For example, suppose that a program P prints its own code. Then, any obfuscation O(P) of P also prints P. If a program is determined to be "exhibitionist," there is nothing obfuscation can do to prevent it. Even if a program is not so exhibitionist, it might be learnable. Roughly, a function is learnable if, given only oracle access to the program P, an adversary can efficiently construct a program P' that has the same input-output behavior as P. (Of course, any program is learnable inefficiently, but we only aim to defeat polynomial-time adversaries.) For example, consider a program Ps(x) that outputs 1 if x < s, and 0 otherwise. The class of programs {Ps}s is learnable, since a simple binary search would allow an adversary to learn the secret s. Even a black box does not hide P if P is learnable from oracle access. So, we will be interested in obfuscating programs that are unlearnable: programs that at least can keep secrets secure against efficient adversaries that are given only oracle access.

Cryptography offers many examples of functions that are unlearnable, based on widely believed assumptions such as the existence of one-way functions or the hardness of the discrete logarithm problem. For example, given any one-way function, one can construct a pseudorandom function (PRF), a function that no adversary can efficiently distinguish from a truly random function. A truly random function is unlearnable, and therefore so is a PRF, since otherwise it could be distinguished from a truly random function. Assuming the discrete logarithm problem is hard, one can construct an encryption scheme that is secure against adaptive chosen-ciphertext attack (CCA-secure encryption). An encryption scheme is CCA-secure if the adversary cannot guess anything about what a new ciphertext c encrypts, even after it asks for and obtains the decryptions of other ciphertexts c1, ..., ct of its choice. For a CCA-secure encryption scheme, the decryption function is unlearnable.

The unlearnable functions above have some secret inside thema PRF key or a decryption key. If we had black-box obfuscation, we could certainly prevent this secret from being extracted; we could hide secrets in software.

Of course, we do not have general black box obfuscation. We need to figure out how to hide secrets in software without black boxes, using just iO. We start to address this issue below.

Shell Games with Secrets. Here is a typical "shell game" we play when we want to prove that iO obfuscation hides a secret. Suppose that (sk0, pk0) and (sk1, pk1) are independent secret/public key pairsfor example, for a public-key encryption scheme. Suppose that we have two programs P0 and P1 that have the same input-output behavior, but P0 only uses sk0 (but not sk1) and P1 only uses sk1 (but not sk0). The definition of iO guarantees that the obfuscation of P0 would be indistinguishable from an obfuscation of P1, but can we leverage this to show that sk0 remains hidden?

Indeed we can: Clearly, the obfuscation of P0 reveals nothing about sk1, since P0 does not know anything about the secret sk1. Similarly, an iO-obfuscation of P1 would reveal nothing about sk0. However, by the security of iO, since P0 and P1 are functionally equivalent, it is hard for an adversary to distinguish whether P0 or P1 was obfuscated. Therefore, since P1 could have been obfuscated (as far as the adversary knows), the iO-obfuscation of P0 also reveals nothing about sk0. In other words, since the output of the programs is the same regardless of whether sk0 or sk1 is inside, we can use iO to play a "shell game" to switch the secret inside, and thus argue that neither secret is revealed. Similar "two-key" techniques have been used many times in cryptography, for example to construct CCA-secure encryption schemes.22,24

As an example of how we use this shell game, take our method of bootstrapping iO for NC1 circuits to iO for general programs (modeled as boolean circuits).

To construct iO for general programs, our natural starting point is fully homomorphic encryption (FHE).13, 14, 23 FHE is a special kind of encryption that allows anyone (e.g., the "cloud") to process or compute on encrypted data while it remains encrypted, without decrypting it. In other words, FHE-encrypted data allows anyone to compute an encryption of any function of that data. Interpreting data as a program, FHE allows the creation of encrypted programs. An encrypted program, however, is not the same thing as an obfuscated program, since FHE-encrypting a program changes its functionality: given x as input, the FHE-encrypted program's output is not P(x), but an encryption of P(x). However, FHE still seems like a good place to start, since we already have FHE schemes capable of evaluating general programs (modeled as boolean or arithmetic circuits),13 even schemes based on simple well-studied assumptions.4, 5, 16 To get obfuscation we need to figure out how to securely decrypt the output of the FHE-encrypted program. But for this to work, we also need to make sure that the user has properly applied FHE to produce an encryption of P(x). Indeed, this is critical, since giving away unrestricted access to the FHE decryption function would give away the farm!

For this conditional decryption step, we use our shell game. The obfuscation of a boolean circuit P0 consists of an FHE encryption of P0 under two different independent secret/public FHE key pairs (sk0, pk0) and (sk1, pk1), and a conditional decryption circuit CDec (to be described momentarily) that is obfuscated using our iO-obfuscator for NC1 circuits. (To hide the topology of P0, we use a universal circuit U such that U(P0, x) = P0(x) and encrypt the input P0 to U.) The evaluator runs both encrypted versions of P0 on its input, and feeds the resulting ciphertexts, along with a proof that they were computed correctly, as input to iO(CDec), which decrypts the first ciphertext using sk0 if the proof verifies, and outputs otherwise. The proof here is straightforward: it consists of the input and a step-by-step transcript of the FHE evaluation of the encrypted program on that input. Fortunately, using existing constructions, both the proof verification and FHE decryption can be computed using a NC1 circuit, as required.

We now need to argue that our construction really does yield a way of bootstrapping iO for NC1 to iO for general programs. Suppose P1 is a program with the same input-output behavior as P0. We transform the obfuscation of P0 into an obfuscation of P1 via a sequence of steps. First, we replace the encryption of P0 under pk1 with an encryption of P1 under pk1; this change is indistinguishable to an adversary since sk1 is never used, and therefore encryptions of any two strings under pk1 must be indistinguishable from each other. Next, and this is the crucial step, we change CDec so that it contains only sk1 (not sk0) and decrypts the second FHE ciphertext using sk1 (instead of the first ciphertext using sk0) after verifying the proof. The key point is that input-output behaviors of the old and new versions of CDec are the same, since (by the proof and the equivalence of P0 and P1) if the output of CDec is not , then the two FHE ciphertexts must have the same decryption. Consequently, by the security of iO, this change is indistinguishable to the adversary. We can then proceed in two similar steps, changing the other encryption of P0 to P1 and switching the secret key back to sk0, to arrive at an obfuscation of P1 without the adversary being any wiser.

Punctured Programs. Another way to use iO to hide secrets in software is the punctured program technique of Sahai and Waters.26 We illustrate the technique in Section 4.2, where it used to convert secret key encryption into public key encryption by obfuscating the encryption function.

* 1.4. Not quite obfutopia

While we have managed to squeeze useful security guarantees out of iO (with more positive results based on iO appearing regularly in the cryptographic literature), nevertheless cryptographers remain somewhat unsatisfied. One problem is that we do not understand what absolute guarantees we can make about obfuscation itself, or even what sort of security current iO constructions may give us. The "right" definition for program obfuscation is far from resolved, and researchers have begun to investigate stronger notionssuch as differing inputs obfuscation1that also seem to escape the impossibility of VBB.

Also, while VBB obfuscation is impossible in general, it remains possible that a broad class of programs, including many "real-world" programs, can be VBB obfuscated. But what is this class of programs?

The reason that VBB obfuscation remains appealing as a notion, despite its problems, is that we have a long list of glorious applications we could build in a worldcall it "Obfutopia"where general VBB program obfuscation is somehow possible. For example, a classic motivation for obfuscation was to build secure autonomous mobile agents that perform transactions on your behalf to your specifications using your secret signing key. The agent may run on untrusted platforms that could try to "torture" it to extract your secrets. While VBB obfuscation would solve this problem, trying to prove that this is secure with a weaker notion of obfuscation is a challenge.

As a more speculative application, suppose that in the distant future we want to upload our minds to cyberspace. Assuming this happens, we might as well do it securely. Can a mind programmed in software have secrets, innermost thoughts? With iO, we can provide some assurance: if a mind can compartmentalize its secrets so that they do not affect its behavior, the secret can remain hidden. But although digital minds presumably would have impressive powers of compartmentalization, it would be more convenient if obfuscation's assurances were less restrictive.

Like all utopias, Obfutopia has echoes of dystopia. Just as access to secure encryption makes it easier for terrorists to hide their plans, obfuscation can be used for nefarious purposesfor example, by malware designers to hide malicious content. Privacy-enhancing technologies often come with a cost. However, given the ever-increasing list of positive applications of secure obfuscation, we argue that the potential benefits outweigh the potential costs.

Back to Top

2. Obfuscation Basics

Barak et al.1 formally define an obfuscator as a probabilistic (randomized) algorithm O that takes a program P as input and meets the following three conditions:

  1. Functional equivalence: For every program P, the string O(P) describes a program with the same input-output behavior as P.
  2. Polynomial slowdown: The description length and running time of O(P) are polynomially related to P.
  3. Unintelligibility.

We will define the security of obfuscation ("unintelligibility") in Section 2.2. For now, let us focus on obfuscation's functional aspects.

* 2.1. What is a program?

To make the definition of program obfuscation meaningful, we first need to define what we mean by a "program." Obviously, we do not want a definition of programs that is language-dependent, so we use theoretical models of computation. In this article, we will use Boolean circuits as our model of computation.

A Boolean circuit is a composition of Boolean gates, such as AND, OR, and NOT gates. The gates are typically arranged into levels, so that the outputs of gates at level i are inputs to gates at level i + 1 unless i is the last level of the circuit. A circuit does not contain any loops; it is a directed acyclic graph. The number of gates is called the size of the circuit, and the number of levels is called the depth. For a circuit C, we denote by |C| the size of C.

In a circuit, all loops are unrolled and all branches are evaluated. Unlike a RAM program, which may touch only input that is relevant for a particular execution, a circuit ingests all the input at its input level. Consequently, translating some RAM programs into circuits may add a lot (but still polynomial amount) of overhead. We will ignore this issue here, as efficiency (beyond achieving polynomial overhead) is not our concern here.

An NC1 family of circuits is one where the depth of each circuit in the family is at most logarithmic in its size.

With these preliminaries, we can make precise the functionality and polynomial slowdown requirements for obfuscation. Namely,

  1. Functional equivalence: For every Boolean circuit C, O(C) describes a circuit C' such that for all inputs x, we have C(x) = C'(x).
  2. Polynomial slowdown: There exists a polynomial p such that for all Boolean circuits C, the size of the circuit described by O(C) is at most p(|C|).

* 2.2. What does "unintelligible" mean?

As mentioned in the Introduction, we take indistinguishability obfuscation (iO)1 as our notion of unintelligibility. In the formal definition of iO, security is modeled as a game between a challenger and an adversary. The adversary picks two circuits P0 and P1 that are the same size (padding one of the circuits if necessary) and that have the same input-output behavior but which may differ internally. It gives these programs to the challenger. The challenger obfuscates one of them, producing an obfuscation O(Pb) for random b {0, 1}, which it gives to the adversary. Finally, the adversary tries to guess b. The obfuscation is considered secure if no efficient adversary can win this game with probability non-negligibly greater than 1/2.

In principle, one way to iO-obfuscate would be to canonicalize Pb by outputting the lexicographically first program with the same input-output behavior as Pb, in which case O(P0) = O(P1) and the adversary cannot win with probability greater than 1/2. But true canonicalization is inefficient in general unless P = NP, so we must settle for pseudo-canonicalization and require that the distributions of O(P0) and O(P1) are computationally indistinguishable to efficient adversaries, in the sense described above: no efficient adversary should be able to guess the bit b with probability non-negligibly greater than 1/2.

As mentioned in the Introduction, the work of Goldwasser and Rothblum19 showed that an (efficient) indistinguishability obfuscator is a "best-possible" obfuscation: it is as good as any other obfuscation of roughly the same complexity.19 The proof is simple: Suppose BO(·) is the actual best obfuscator of a certain class of programs (modeled as boolean circuits), while Pad(·) merely increases the size of the circuits the same amount as BO(·). Then, for any circuit C, the circuits BO(C) and Pad(C) are the same size and functionality equivalent. Therefore, if iO is an indistinguishability obfuscator, iO(BO(C)) and iO(Pad(C)) are indistinguishable to efficient adversaries. Since they are indistinguishable, iO(Pad(C)) obfuscates C as well as iO(BO(C)), which obfuscates C as well as BO(C). Unfortunately, however, "best possible" is only a relative guarantee that does not directly imply any secrets in the obfuscated program remain hidden.

Another justification for iO is that it turns out to be surprisingly useful. In Section 4, we add to the applications of iO mentioned in the Introduction.

Back to Top

3. Our Core Obfuscator

Recall that our obfuscation scheme11 works in two steps. In the Introduction, we already sketched how to bootstrap obfuscation for NC1 circuits to obfuscation for general circuits. Now we will sketch how our "core" obfuscator obfuscates NC1 circuits. Our core obfuscator builds on a new cryptographic tool called multilinear maps,7, 10, 15 which we review below.

* 3.1. Cryptographic multilinear maps

As a first approximation, one can say that a cryptographic multilinear map system encodes a value x p (where p is some large prime) by encrypting it using a FHE scheme. The multilinear map system inherits the homomorphism of FHE: given encodings of x and y, one can compute encodings of x + y and x · y. But the multilinear map system comes with additional constraints, and additional functionality. Each encoding is associated to some level i k for parameter k. Encodings can only be added if they are at the same level: Enci(x) Enci(y) Enci(x + y). Encodings can be multiplied, subject to restrictions, and the levels are added: Enci(x) Encj(y) Enci+j.(x · y) if i + j k but is meaningless when i + j > k. Unlike FHE, the multilinear map system comes equipped with a zero test: the system gives an efficient procedure to test whether an encoding at level k encodes the value 0.

Thus, unlike an encryption scheme, the multilinear map system leaks some information about what is encoded (whether the value is zero). Roughly, we carefully construct our core obfuscator so that the leakage of the multilinear map system is positioned precisely to become the leakage of the overall obfuscated program output.

* 3.2. Our core obfuscator for NC1 circuits

We now sketch how our "core" obfuscation scheme for NC1 circuits (circuits of logarithmic depth) works. We confess that many details will be lacking in this short article; see Garg et al.11 for more details. Instead, we will try to give you a glimpse into the main insights that make our approach work, and a very rough outline of how these insights can be extended to yield an actual obfuscation.

Barrington: A more friendly mathematical representation of NC1 circuits. Boolean circuits are already an unrealistic representation of general computer programs. Few of us would write code as a Boolean circuit. But to get where we want to go, we move to a "programming language" that is even further removed from reality, but that does not lose any generality. We are not doing this for kicks, though. Our objective is to represent our NC1 circuit in a way that plays nicely with our cryptographic multilinear maps and allows strong randomization.

The specific programming language that we choose is a variant of a branching program considered by Barrington,2 that we call a Matrix Branching Program. In a matrix branching program, the program is described by a number n, and a sequence of pairs of full-rank square matrices

ueq01.gif

The number n, which must be at most k, describes the length of inputs that the program can take as input. Given an input x {0, 1}n, to "run" the program M on x, we compute the matrix product:

ueq02.gif

where here x[j] denotes the jth bit of x, where j can range from 0 to n - 1.

A matrix branching program is valid if this product always yields either the identity matrix I, or some other fixed matrix B I. If the product is I, then we say that the output of the program M(x) is 1. If the product is B, then we say that the output of the program M(x) is 0.

This representation is useful because Barrington showed that any NC1 circuit can be efficiently re-expressed as a polynomial-size matrix branching program. Also, matrix product is a multilinear operation: every entry in each matrix occurs exactly once in every monomial of the product, so that we can use the multilinear map system to encode every entry of each matrix, and compute the product correctly using the multilinear map operations described above. Lastly, as we describe next, matrix branching programs can be globally randomized in a strong sense.

Kilian: Global Randomization of Branching Programs. The entire global randomization that we actually use to obtain our obfuscation, and why it works, is frankly too complex to describe at length here. Instead, we will ask you to suspend disbelief as we describe how to obfuscate a special case of programs: namely, programs that have no input at all. Now, the thoughtful reader will complain that this special case is utterly trivial to obfuscate, and indeed it is: One could simply pre-compute the output of the program, and provide this output as the obfuscation. This would absolutely be a secure obfuscationalthough this is not what we are going to do. We ask you to put this thought aside. We will sketch a different way to deal with this special case, and we assure you that our method for dealing with this special case will provide insight into why our actual global randomization works in the general case.

What does a matrix branching program with no input look like? Instead of a sequence of pairs of matrices, it would just be a sequence of matrices:

ueq03.gif

and its output would be the product

ueq04.gif

How can we globally randomize these matrices, so that the output is preserved, but the "steps" of the computation reveal nothing more than the output? The answer is to apply a technique of Kilian,21 as follows: Choose completely random invertible matrices R1, ..., Rk-1 over Zp, and set R0 = Rk = I. Now, define

ueq05.gif

Then, we can observe that

ueq06.gif

However, a simple inductive argument shows that the entire collection of matrices cacm5905_d.gif contain no information beyond the product i Mi. For example, suppose k = 2. In other words, we only have two matrices cacm5905_e.gif and cacm5905_f.gif. Then, by letting Q = M21 R1 and algebraic rearrangement, we see that equivalently, we could set cacm5905_g.gif and cacm5905_h.gif. Note that if R is a random invertible matrix, then so is Q. Thus, cacm5905_i.gif and cacm5905_j.gif can be created knowing only the product M1M2, and nothing else about the original "steps" of the computation M1 and M2. This proof generalizes to any k.

This insight shows that it is in fact possible to globally randomize a program, using only local randomizations with a set of correlated random variables, in a way that preserves the output, but completely randomizes the individual steps.

Our actual core obfuscatorwhich must deal with an exponential number of potential inputsbegins with this idea. In essence, this idea is used to argue security on an input-by-input basis. Additional global randomizations are invoked to thwart other algebraic attacks that can arise when inputs are present (e.g., input-mixing attacks, where part of the computation is done with respect to one input, and another part with respect to another input). But the main idea remains the same: to globally randomize the program in a way that preserves the output, but hides all other useful information about the original individual steps of the program and how they combined with each other.

Back to Top

4. Applications

Indistinguishability obfuscation has enabled a wide variety of exciting applications that has become so vast that it is impossible to do them justice here. Here, we cover just a few applications to help the reader appreciate the possibilities.

* 4.1. Witness encryption

Traditionally, encryption is a tool for conveying confidential messages to specific recipients that have secret keys.

Departing from tradition, suppose we would like to encrypt a message not to a fixed recipient who has a secret key, but to anyone who knows a proof for the Riemann Hypothesis (or, more generally, a witness for some NP statement). We want any recipient that has any proof (of some bounded size) of the RH to be able to decrypt the message. We also want to generate this encryption without knowing ourselves whether the RH is true or not. Regarding security, we want that if the RH is false, then the ciphertext should hide the encrypted message. This problem of witness encryption12 (WE) was first posed by Rudich in 1987.

More formally, given an NP Language L, a WE scheme for L is an encryption scheme that takes as input an instance x and a message bit b, and outputs a ciphertext c. If x L and w is a valid witness for x, then a decryptor can use w to decrypt c and recover b. However, if x L, then an encryption of 0 should be computationally indistinguishable from an encryption of 1.

Using iO, we can easily construct a WE scheme for an NP-Complete language such as SATISFIABILITY. Specifically, consider the circuit Cx,b(w) (with x and b hard-coded in it) defined as follows: on input w, Cx,b outputs b if w is a valid witness for x L, and otherwise just outputs (indicating "undefined"). To encrypt b, the encrypter simply outputs an iO obfuscation of Cx,b. With a legitimate witness w for x L, anyone can evaluate the obfuscated circuit to recover b. On the other hand, if x L, then both Cx,0 and Cx,1 are constant functions that always output , and iO guarantees that the obfuscation of these equivalent functions are computationally indistinguishable.

* 4.2. Punctured programming: secret-key to public-key encryption

Diffie and Hellman8 already described some of the potential of obfuscation back in 1976 in their "New Directions" paper. They noticed that obfuscation could potentially be used to transform a secret key encryption scheme (where sender and receiver use the same secret key) into a public key encryption (PKE) scheme (where the sender uses a "public" key for encryption and only the receiver knows the secret key for decryption). Specifically, take the encryption procedure or program, with the secret key embedded inside it, and publish its obfuscation as the public key. Senders could use this public obfuscated program to encrypt messages, but (hopefully) not to decrypt ciphertexts, as the secret key remains hidden inside the obfuscated program.

Given ideal black box obfuscation, this technique works well for some secret key encryption schemes. (Beware that it fails completely for some others, like stream ciphers, where encryption is an involution.) Can we make it work for iO? Indeed, we can: in fact, we have two quite different ways to construct public-key encryption from iO and any one-way function (OWF).

The first technique combines WE with a length-doubling pseudorandom generator (PRG) G. Such a PRG can be built from any OWF, and has the property that G(x) {0, 1}2n is computationally indistinguishable from a random string when x {0, 1}n is sampled uniformly. In the PKE scheme, the secret key is a uniform w {0, 1}n, and the public key is W = G(w). To encrypt a bit b, the sender generates a witness encryption of b that can be decrypted by any "witness" w' such that W = G(w'). The legitimate decrypter can decrypt using w. Security follows from three observations: (1) By the security of the PRG, an adversary cannot distinguish whether W is a uniform 2n-bit string, and (2) If W were a uniform 2n-bit string, then (with high probability) it is not in the image of the PRG, and thus there is no valid witness for W, and (3) If there is no valid witness, the security of WE guarantees that b remains hidden.

Sahai and Waters26 found a way of building PKE from iO that is quite close to what Diffie and Hellman had in mind. To do so, they introduced a general technique called punctured programming, that has turned out to have many applications. The high-level idea is "to alter a program (which is to be obfuscated) by surgically removing a key element of the program, without which the adversary cannot win the security game it must play, but in a way that does not alter the functionality of the program."26 This is perhaps best explained by jumping into the technical details. The PKE scheme uses a length-doubling pseudorandom generator (PRG) G, as described above. The scheme also uses a pseudorandom function (PRF) F, which takes a key and a 2n-bit string, and outputs a bit that is computationally indistinguishable from uniform. However, we want our PRF to have a special property: we want it to be "puncturable." A puncturable PRF has the property that for any key K and input cacm5905_k.gif, there is a compact punctured key cacm5905_l.gif that allows one to evaluate F(K, x) for any x cacm5905_k.gif, but keeps the value of F(K, cacm5905_k.gif) pseudorandom. (Puncturable PRFs can also be built from any OWF.) Now, consider a randomized secret key encryption scheme, where the secret key is a key K for the PRF, one encrypts m with randomness r as EncK(r, m) = (G(r), F(K, G(r)) m), and the ciphertext is decrypted in the obvious way using K. Sahai and Waters show that they can use iO to obfuscate the encryption function while leaving decryption unchanged. In this PKE scheme, decryption amounts to evaluating the (puncturable) PRF. No other PKE scheme has such a fast decryption algorithm!

To prove security, Sahai and Waters use their punctured program technique. For any cacm5905_k.gif, we define a punctured encryption cacm5905_m.gif, which uses the punctured key cacm5905_l.gif to evaluate the PRF except at cacm5905_k.gif, where it outputs (indicating "undefined"). Sahai and Waters observe that the encryption function will never evaluate the PRF on any cacm5905_k.gif outside the range of the PRG. Therefore, for any such cacm5905_k.gif, the original encryption function has the same functionality as cacm5905_n.gif, and iO will make them indistinguishable. Now, by the security of the PRG, the adversary cannot distinguish a legitimate ciphertext from a fake ciphertext (cacm5905_k.gif, F(K, cacm5905_k.gif) m) where cacm5905_k.gif is chosen uniformly, and is therefore with high probability outside the range of the PRG. But since the punctured public key (the obfuscation of cacm5905_n.gif) contains no usable information about F(K, cacm5905_k.gif) (even the secret key cacm5905_l.gif inside provides no help), m appears pseudorandom.

* 4.3. Functional encryption

Consider a hospital administrator who has access to sensitive patient medical informationspecifically medical history and gene sequences of the patients. In the same hospital a medical researcher is working on establishing a link between a specific set of genes and a type of cancer. In order to validate his findings the researcher needs access to the medical records of the patients in the hospital. The hospital administrator would like to help the researcher but is legally obliged to keep patient records private. In such scenarios, patient data is typically collected before the queries that the medical researcher might be interested in are known. Functional encryption (FE)3, 25 enables an elegant solution to this problem. The administrator could encrypt the data as it is created and later provide a specialized secret key to the medical researcher allowing him to learn only the output of his analysis algorithm on the patient data, and nothing more.

More formally, in FE, ciphertexts encrypt inputs x and keys are issued for functions f. The striking feature of this system is that using the key SKf to decrypt a ciphertext Enc(x), yields the value f(x) but does not reveal anything else about x. Furthermore, any arbitrary collusion of key holders relative to many functions fi does not yield any more information about x beyond what is "naturally revealed" by the functionality (i.e., fi(x) for all i). More precisely, the security definition of FE requires that the adversary cannot distinguish between encryptions of x0 and x1, even when it is provided with many secret keys SKf for which f(x0) = f(x1). Note that keys not satisfying this constraint allow the adversary to trivially distinguish between encryptions of x0 and x1.

We next sketch our construction of a FE for all circuits11 from iO. In order to build intuition we start with a solution assuming ideal black box obfuscation. We start by selecting (PK, SK), a public-secret key pair for a regular public-key encryption scheme. An encryption of a value x under the FE scheme will simply be an encryption of x using the public key PK. The secret key SKf corresponding to a circuit f is an obfuscation of a circuit that takes a ciphertext c as input, uses SK (hardcoded inside the obfuscated circuit) to decrypt c obtaining x, and then outputs f(x). Intuitively the obfuscation will hide the secret key SK, achieving the desired security.

To adapt the above construction to iO, we use the two-key shell game described in the Introduction. In particular, we set the public key for our FE scheme to be two public keys PK0 and PK1. An encryption of x consists of encryptions of x under both keys, together with a special zero knowledge proof that the two encryptions encrypt the same value. The secret key SKf is an obfuscated program that checks the proof, uses one of the secret keys to obtain x, and then outputs f(x).

As in the shell game in the Introduction, if the proof is true, then the output of SKf is the same whether it uses SK0 or SK1, and the strategy is to use iO to argue that these cases are indistinguishable. However, ultimately we need to argue that the adversary cannot distinguish an encryption of x0 from an encryption of x1, and for this our proof needs to use an intermediate hybrid experiment where a "fake" simulated cryptographic proof is generated, and the two ciphertexts are actually encryptions of x0 and x1 respectively. But here we encounter a new problem: if it is possible to generate "fake" cryptographic proofs that "prove" the ciphertexts encrypt the same value when they do not, then we cannot use iO to transition from SK0 to SK1 as described above.

We address this problem by introducing the notion of statistically simulation sound cryptographic proofs, which require that valid proofs only exist for true statements, with exactly one exception: there is a single false statement * such that there exists a valid proof * for *. However, for every other statement , if a valid proof exists for , then must be true. We show how to achieve statistical simulation soundness using witness-indistinguishable proofs.9 Once we have it, then a similar two-key alternation argument to the one given in the Introduction allows us to establish security of our FE scheme. We note that full collusion-resistance follows from a standard hybrid argument using the iO definition for each secret key given out, one at a time.

* 4.4. Restricted-use software

Here we discuss an application of iO to a question that does not deal with encryption or authentication, but rather restricted-use software.

Software developers will often want to release a demo or restricted use version of their software that limits the features that are available in a full version. In some cases a commercial software developer will do this to demonstrate their product; in other cases the developer will want to make multiple tiers of a product with different price points. In other domains, the software might be given to a partner that is only partially trusted and the developer only wants to release the features needed for the task.

Ideally, a developer could create a downgraded version of software simply by starting with the full version and then turning off certain features at the interface levelrequiring minimal additional effort. However, if this is all that is done, it could be easy for an attacker to bypass these controls and gain access to the full version or the code behind it. The other alternative is for a software development team to carefully excise all unused functionality from the core of the software. Removing functionality can become a very time consuming task that could itself lead to the introduction of software bugs. In addition, in many applications it might be unclear what code can and cannot remain for a restricted use version. For instance, a subroutine could be written in a highly general form, where only a subset of its functionality is needed for several applications. In this case, the subroutine would have to be re-designed to "cripple" it, a time-consuming and potentially difficult task.

iO provides an immediate solution: the developer can restrict the use at the interface level and then release an obfuscated version of the program. For this application iO suffices, since by definition a version restricted in the interface is indistinguishable from an obfuscated program with equivalent behavior that has its parts removed or redesigned at the start.

Back to Top

5. Conclusion

We described a candidate indistinguishability obfuscation scheme, and some of the exciting applications it makes possible. However, a lot remains to be doneto base obfuscation on well-established assumptions, to build a practical scheme, to truly understand what obfuscation meansbefore cryptographic obfuscation can move from being a proof of concept to a widely-deployed privacy enhancing technology.

Back to Top

References

1. Barak, B., Goldreich, O., Impagliazzo, R., Rudich, S., Sahai, A., Vadhan, S., Yang, K. On the (im)possibility of obfuscating programs. In CRYPTO (2001). Springer, 118.

2. Barrington, D.A. Bounded-width polynomial-size branching programs recognize exactly those languages in nc1. In STOC (1986). ACM, 15.

3. Boneh, D., Sahai, A., Waters, B. Functional encryption: Definitions and challenges. In Theory of Cryptography Conference (TCC) (2011). Springer, 253273.

4. Brakerski, Z., Vaikuntanathan, V. Efficient fully homomorphic encryption from (standard) lwe. SIAM J. Comput. 43, 2 (2014), 831871.

5. Brakerski, Z., Vaikuntanathan, V. Lattice-based FHE as secure as PKE. In Innovations in Theoretical Computer Science (ITCS) (2014), ACM, 112.

6. Collberg, C., Nagra, J. Surreptitious Software: Obfuscation, Watermarking, and Tamperproofing for Software Protection, 1st ed. Addison-Wesley Professional, Boston, MA, 2009.

7. Coron, J.-S., Lepoint, T., Tibouchi, M. Practical multilinear maps over the integers. In CRYPTO (2013). Springer, 476493.

8. Diffie, W., Hellman, M.E. New directions in cryptography. IEEE Trans. Inform. Theory 22, 6 (1976), 644654.

9. Feige, U., Shamir, A. Zero knowledge proofs of knowledge in two rounds. In CRYPTO'89 (1990). Springer 526544.

10. Garg, S., Gentry, C., Halevi, S. Candidate multilinear maps from ideal lattices. In EUROCRYPT (2013). Springer, 117.

11. Garg, S., Gentry, C., Halevi, S., Raykova, M., Sahai, A., Waters, B. Candidate indistinguishability obfuscation and functional encryption for all circuits. In FOCS (2013). IEEE, 4049.

12. Garg, S., Gentry, C., Sahai, A., Waters, B. Witness encryption and its applications. In STOC (2013). ACM, 467476.

13. Gentry, C. A fully homomorphic encryption scheme. PhD thesis, Stanford University (2009). crypto. stanford.edu/craig.

14. Gentry, C. Computing arbitrary functions of encrypted data. Commun. ACM53, 3 (2010), 97105.

15. Gentry, C., Gorbunov, S., Halevi, S. Graph-induced multilinear maps from lattices. In Theory of Cryptography Conference (TCC) (2015). Springer 498527.

16. Gentry, C., Sahai, A., Waters, B. Homomorphic encryption from learning with errors: Conceptually-simpler, asymptotically-faster, attribute-based. In CRYPTO (2013). Springer, 7592.

17. Goldwasser, S., Kalai, Y.T. On the impossibility of obfuscation with auxiliary input. In FOCS (2005). IEEE, 553562.

18. Goldwasser, S., Micali, S. Probabilistic encryption. J. Comput. Syst. Sci. 28, 2 (1984), 270299.

19. Goldwasser, S., Rothblum, G.N. On best-possible obfuscation. In Theory of Cryptography Conference (TCC) (2007). Springer, 194213.

20. Hada, S. Zero-knowledge and code obfuscation. In ASIACRYPT (2000). Springer, 443457.

21. Kilian, J. Founding cryptography on oblivious transfer. In STOC (1988). ACM, 2031.

22. Naor, M., Yung, M. Public-key cryptosystems provably secure against chosen ciphertext attacks. In STOC (1990). ACM, 427437.

23. Rivest, R., Adleman, L., Dertouzos, M.L. On data banks and privacy homomorphisms. In Foundations of Secure Computation (1978), 169180.

24. Sahai, A. Non-malleable non-interactive zero knowledge and adaptive chosen-ciphertext security. In FOCS (1999). IEEE, 543553.

25. Sahai, A., Waters, B. Fuzzy identity-based encryption. In EUROCRYPT (2005). Springer, 457473.

26. Sahai, A., Waters, B. How to use indistinguishability obfuscation: Deniable encryption, and more. In STOC (2014). ACM, 475484.

Back to Top

Authors

Sanjam Garg (sanjamg@berkeley.edu), Department of EECS, UC Berkeley, Berkeley, CA.

Craig Gentry (cbgentry@us.ibm.com), Cryptography Research Group, IBM TJ Watson Research Center, Yorktown Heights, NY.

Shai Halevi (shaih@us.ibm.com), Cryptography Research Group, IBM TJ Watson Research Center, Yorktown Heights, NY.

Mariana Raykova (mariana@cs.columbia.edu), Computer Science Laboratory, SRI International, Menlo Park, CA.

Amit Sahai (sahai@cs.ucla.edu), Department of Computer Science, UCLA, Los Angeles, CA.

Brent Waters (bwaters@cs.utexas.edu), Department of Computer Science, UT Austin, Austin, TX.

Back to Top

Footnotes

The original version of this paper is entitled "Candidate Indistinguishability Obfuscation and Functional Encryption for All Circuits" and was published in FOCS 2013.


©2016 ACM  0001-0782/16/05

Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and full citation on the first page. Copyright for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, to republish, to post on servers, or to redistribute to lists, requires prior specific permission and/or fee. Request permission to publish from permissions@acm.org or fax (212) 869-0481.

The Digital Library is published by the Association for Computing Machinery. Copyright © 2016 ACM, Inc.


Comments


Asankhaya Sharma

The scope of today's cybersecurity problem is somewhat mind-boggling. Security breaches due to malware cost victims more than $500 billion each year worldwide, according to a 2014 study by market research firm IDC and the University of Singapore.

Can you please add a citation for this study and how National University of Singapore was involved?


Displaying 1 comment