Security and Privacy Research highlights

Hiding Secrets in Software: A Cryptographic Approach to Program Obfuscation

By Sanjam Garg, Craig Gentry, Shai Halevi, Mariana Raykova, Amit Sahai, and Brent Waters

Posted May 1 2016

Abstract
1. Introduction
2. Obfuscation Basics
3. Our Core Obfuscator
4. Applications
5. Conclusion
References
Authors
Footnotes

Hiding Secrets in Software, illustration

Can we hide secrets in software? Can we obfuscate programs—that is, make programs unintelligible while preserving their functionality? What exactly do we mean by “unintelligible”? Why would we even want to do this?

In this article, we describe some rigorous cryptographic answers to these quasi-philosophical questions. We also discuss our recent “candidate indistinguishability obfuscation” scheme and its implications.

1. Introduction

Informally, the goal of program obfuscation is deceptively easy to state: to make a program unintelligible while preserving its functionality. By “preserving its functionality,” we just mean that the obfuscated program should be fully executable and have the same input-output behavior as the original program. Defining “unintelligible” is more tricky.

1.1. Making sense of the “unintelligible”

Of course, most program code is already “unintelligible” in the colloquial sense of the word. “Indeed, any programmer knows that total unintelligibility is the natural state of computer programs (and one must work hard in order to keep a program from deteriorating into this state).”¹

But suppose we want a stronger notion of unintelligibility, one that applies not only to sleep-deprived programmers but also to automated “deobfuscators” and determined, resourceful adversaries. Previous work on obfuscation has focused on making code resistant/unintelligible to particular attacks, such as known techniques for static and dynamic analysis.⁶ However, it is a recurring theme in security engineering that resistance to one set of attacks is no guarantee that a system cannot be broken by other means. And so, many of these obfuscation schemes have been followed by corresponding schemes for deobfuscation.⁶

Modern cryptography, especially following the work of Goldwasser and Micali,¹⁸ has offered a rigorous and expansive approach to defining security, offering the best way to escape the break-and-repair cycle of security engineering: Define security (in our case, the word “unintelligible”) in a mathematically rigorous way, and then prove that if there exists any adversary that can efficiently break security (according to the definition), then there is an efficient algorithm to solve a “hard” mathematical problem, such as factoring large integers. This approach makes a computational assumption—that the underlying mathematical problem is hard (i.e., cannot be solved in time polynomial in a security parameter), and that the adversary is computationally bounded to run in polynomial time. Under this computational assumption the scheme is provably secure. Crucially, this approach makes no other assumption or guess about what type of attack the adversary might deploy to break security. Of course, the security definition might not completely capture the adversary’s access to the system: for example, early definitions of security for encryption assumed the secret key could be stored and used securely, while “side channel” attacks have challenged this assumption and led to refined definitions. And the “hard” problem might turn out not to be as hard as initially thought: for example, algorithms for factoring large integers have improved immensely since the invention of RSA. But history has shown that the rigorous cryptographic approach to security is the best way to filter out schemes that break in “uninteresting” ways.

So, how does cryptography rigorously define the unintelligibility of obfuscation? The natural conceptual way to view the unintelligibility of obfuscation is as a black box that allows oracle access to a program (you give it inputs and get back outputs) but which completely hides the program’s internal workings. Hada²⁰ and Barak et al.¹ were able to make this definition of obfuscation mathematically rigorous, but then Barak et al. showed that even a weaker notion called virtual black box (VBB) obfuscation is impossible to achieve: some programs are VBB-unobfuscatable. Even stronger negative results were obtained for other models.¹⁷ This profound negative result, and our inability to find a clear line between obfuscatable and unobfuscatable programs, essentially crushed cryptographic research on obfuscation for over a decade.

Barak et al.¹ also presented a second way of viewing program obfuscation: as program pseudo-canonicalization. The idea is that if programs P₀ and P₁ have the same input-output behavior, and I obfuscate one of them, it should be hard for an adversary to tell which program I obfuscated. Thus, the obfuscation “canonicalizes” functionally equivalent programs by making them hard to tell apart. This notion is called indistinguishability obfuscation (iO). Importantly, there is no proof suggesting that iO is impossible. Nevertheless, this notion received much less attention, since it seemed too weak to be useful. This view changed somewhat when Goldwasser and Rothblum proved a strong relative guarantee—namely, that an indistinguishability obfuscator is in some sense a best-possible obfuscator: it is as good as any other obfuscator of roughly the same complexity.¹⁹ But still, in the absence of intuitive security guarantees and without a plausibly secure construction of iO for general programs, there was not much interest in seeing what could be built from iO.

1.2. Indistinguishability obfuscation: a candidate construction

This article reviews the first plausibly secure construction of iO for general programs.¹¹ Since this proposal in 2013, interest in iO has exploded, and researchers have found that iO is actually very powerful. Surprisingly, it enables many of the “pie-in-the-sky” applications that motivated research on VBB obfuscation.

Our construction has two parts. First, we show how any method for iO that works for “simple programs”—in particular, for log-depth Boolean circuits, denoted by the complexity class NC¹—can be used to get iO for general programs (modeled as Boolean circuits). Second, we give our “core obfuscator,” namely our iO construction for NC¹ circuits. The bootstrapping transformation from NC¹ circuits to general programs is “cryptographically kosher,” in that we rigorously prove that our iO construction for general programs is secure assuming that the iO construction for NC¹ circuits is secure, and assuming cryptographic tools that by now have become widely accepted, such as cryptographic proofs with certain properties and fully homomorphic encryption (FHE)^{13, 14, 23} with a decryption algorithm in NC¹.

The second part, our core iO construction for NC¹ circuits, is less cryptographically kosher, and this is the reason we call our iO construction a “candidate.” While the security of our NC¹ construction is based on a well-defined mathematical assumption, the assumption is highly non-standard and complicated, having none of the simplicity and elegance of standard cryptographic assumptions like the hardness of factoring large integers. Moreover, our assumption is stated in terms of cryptographic multilinear maps,¹⁰ for which constructions have been proposed only recently, and which themselves rely on their own non-standard assumptions. Cryptographers do not intend to tolerate this situation very long, and currently are working furiously and simultaneously on trying to break these assumptions and to base iO on more natural assumptions.

Another caveat is that our iO construction is quite impractical (though we can expect substantial performance improvements in the future). Our construction is a proof of concept that has helped to transform our understanding of obfuscation, not something that should be deployed any time soon.

How does our approach to general-purpose program obfuscation differ from previous approaches? While many diverse approaches have appeared in the literature, a common thread connecting most previous approaches was the idea of obfuscation by “hiding a needle in a haystack”—that is, by hiding the actually important instructions and conditions within the original program among many extraneous commands and tests.⁶ Unfortunately, computers are excellent at finding needles in haystacks, and these approaches, while natural, do not work very well.

Our approach is very different. At a high level, our core obfuscator creates a globally randomized representation of the program. Intuitively, this globally randomized representation of the program takes each individual instruction from the original program and converts them into highly unnatural randomized instructions. These instructions appear random and useless to an attacker. In particular, each instruction has a random-looking effect on the entire obfuscated program’s state. However, the global randomization that produces these instructions is designed in such a way that the randomization “cancels out” only and exactly when the entire computation is finished. That is, once every one of the weird instructions have been carried out precisely in the specified order, the randomization within these instructions is eliminated and only the true output of the original computation remains. Thus, in our approach, there is neither a haystack nor a needle: every instruction in the obfuscated version of the program is essential; each one moves the computation forward. However, the randomized mathematical representation of the program ensures that until the very end of the computation, every step looks random and uninteresting.

More specifically, our core obfuscator for simple programs combines cryptographic multilinear maps,^{7, 10} a tool somewhat similar to FHE, with variants of tools introduced by Barrington² and Kilian²¹ that allow NC¹ circuits to be efficiently re-expressed as matrix branching programs, which can be randomized in a strong sense. Then, we bootstrap iO from simple programs to complicated programs by encrypting the entire complicated program using FHE, and then conditionally decrypting the result using a simple program that has been obfuscated using our core obfuscator.

1.3. Hiding secrets in software

Why is iO interesting? Although iO is less versatile and trickier to use than VBB obfuscation, it is powerful enough to let us hide secrets in software (in some cases).

First, we should mention that there are limits to what any obfuscation can accomplish. For example, suppose that a program P prints its own code. Then, any obfuscation O(P) of P also prints P. If a program is determined to be “exhibitionist,” there is nothing obfuscation can do to prevent it. Even if a program is not so exhibitionist, it might be learnable. Roughly, a function is learnable if, given only oracle access to the program P, an adversary can efficiently construct a program P’ that has the same input-output behavior as P. (Of course, any program is learnable inefficiently, but we only aim to defeat polynomial-time adversaries.) For example, consider a program P_s(x) that outputs 1 if x < s, and 0 otherwise. The class of programs {P_s}_s is learnable, since a simple binary search would allow an adversary to learn the secret s. Even a black box does not hide P if P is learnable from oracle access. So, we will be interested in obfuscating programs that are unlearnable: programs that at least can keep secrets secure against efficient adversaries that are given only oracle access.

Cryptography offers many examples of functions that are unlearnable, based on widely believed assumptions such as the existence of one-way functions or the hardness of the discrete logarithm problem. For example, given any one-way function, one can construct a pseudorandom function (PRF), a function that no adversary can efficiently distinguish from a truly random function. A truly random function is unlearnable, and therefore so is a PRF, since otherwise it could be distinguished from a truly random function. Assuming the discrete logarithm problem is hard, one can construct an encryption scheme that is secure against adaptive chosen-ciphertext attack (CCA-secure encryption). An encryption scheme is CCA-secure if the adversary cannot guess anything about what a new ciphertext c encrypts, even after it asks for and obtains the decryptions of other ciphertexts c₁, …, c_t of its choice. For a CCA-secure encryption scheme, the decryption function is unlearnable.

The unlearnable functions above have some secret inside them—a PRF key or a decryption key. If we had black-box obfuscation, we could certainly prevent this secret from being extracted; we could hide secrets in software.

Of course, we do not have general black box obfuscation. We need to figure out how to hide secrets in software without black boxes, using just iO. We start to address this issue below.

Shell Games with Secrets. Here is a typical “shell game” we play when we want to prove that iO obfuscation hides a secret. Suppose that (sk₀, pk₀) and (sk₁, pk₁) are independent secret/public key pairs—for example, for a public-key encryption scheme. Suppose that we have two programs P₀ and P₁ that have the same input-output behavior, but P₀ only uses sk₀ (but not sk₁) and P₁ only uses sk₁ (but not sk₀). The definition of iO guarantees that the obfuscation of P₀ would be indistinguishable from an obfuscation of P₁, but can we leverage this to show that sk₀ remains hidden?

Indeed we can: Clearly, the obfuscation of P₀ reveals nothing about sk₁, since P₀ does not know anything about the secret sk₁. Similarly, an iO-obfuscation of P₁ would reveal nothing about sk_0. However, by the security of iO, since P₀ and P₁ are functionally equivalent, it is hard for an adversary to distinguish whether P₀ or P₁ was obfuscated. Therefore, since P₁ could have been obfuscated (as far as the adversary knows), the iO-obfuscation of P₀ also reveals nothing about sk₀. In other words, since the output of the programs is the same regardless of whether sk₀ or sk₁ is inside, we can use iO to play a “shell game” to switch the secret inside, and thus argue that neither secret is revealed. Similar “two-key” techniques have been used many times in cryptography, for example to construct CCA-secure encryption schemes.^22,24

As an example of how we use this shell game, take our method of bootstrapping iO for NC¹ circuits to iO for general programs (modeled as boolean circuits).

To construct iO for general programs, our natural starting point is fully homomorphic encryption (FHE).^{13, 14, 23} FHE is a special kind of encryption that allows anyone (e.g., the “cloud”) to process or compute on encrypted data while it remains encrypted, without decrypting it. In other words, FHE-encrypted data allows anyone to compute an encryption of any function of that data. Interpreting data as a program, FHE allows the creation of encrypted programs. An encrypted program, however, is not the same thing as an obfuscated program, since FHE-encrypting a program changes its functionality: given x as input, the FHE-encrypted program’s output is not P(x), but an encryption of P(x). However, FHE still seems like a good place to start, since we already have FHE schemes capable of evaluating general programs (modeled as boolean or arithmetic circuits),¹³ even schemes based on simple well-studied assumptions.^{4, 5, 16} To get obfuscation we need to figure out how to securely decrypt the output of the FHE-encrypted program. But for this to work, we also need to make sure that the user has properly applied FHE to produce an encryption of P(x). Indeed, this is critical, since giving away unrestricted access to the FHE decryption function would give away the farm!

For this conditional decryption step, we use our shell game. The obfuscation of a boolean circuit P₀ consists of an FHE encryption of P₀ under two different independent secret/public FHE key pairs (sk₀, pk₀) and (sk₁, pk₁), and a conditional decryption circuit CDec (to be described momentarily) that is obfuscated using our iO-obfuscator for NC¹ circuits. (To hide the topology of P₀, we use a universal circuit U such that U(P₀, x) = P₀(x) and encrypt the input P₀ to U.) The evaluator runs both encrypted versions of P₀ on its input, and feeds the resulting ciphertexts, along with a proof that they were computed correctly, as input to iO(CDec), which decrypts the first ciphertext using sk₀ if the proof verifies, and outputs ⊥ otherwise. The proof here is straightforward: it consists of the input and a step-by-step transcript of the FHE evaluation of the encrypted program on that input. Fortunately, using existing constructions, both the proof verification and FHE decryption can be computed using a NC¹ circuit, as required.

We now need to argue that our construction really does yield a way of bootstrapping iO for NC¹ to iO for general programs. Suppose P₁ is a program with the same input-output behavior as P₀. We transform the obfuscation of P₀ into an obfuscation of P₁ via a sequence of steps. First, we replace the encryption of P₀ under pk₁ with an encryption of P₁ under pk₁; this change is indistinguishable to an adversary since sk₁ is never used, and therefore encryptions of any two strings under pk₁ must be indistinguishable from each other. Next, and this is the crucial step, we change CDec so that it contains only sk₁ (not sk₀) and decrypts the second FHE ciphertext using sk₁ (instead of the first ciphertext using sk₀) after verifying the proof. The key point is that input-output behaviors of the old and new versions of CDec are the same, since (by the proof and the equivalence of P₀ and P₁) if the output of CDec is not ⊥, then the two FHE ciphertexts must have the same decryption. Consequently, by the security of iO, this change is indistinguishable to the adversary. We can then proceed in two similar steps, changing the other encryption of P₀ to P₁ and switching the secret key back to sk₀, to arrive at an obfuscation of P₁ without the adversary being any wiser.

Punctured Programs. Another way to use iO to hide secrets in software is the punctured program technique of Sahai and Waters.²⁶ We illustrate the technique in Section 4.2, where it used to convert secret key encryption into public key encryption by obfuscating the encryption function.

1.4. Not quite obfutopia

While we have managed to squeeze useful security guarantees out of iO (with more positive results based on iO appearing regularly in the cryptographic literature), nevertheless cryptographers remain somewhat unsatisfied. One problem is that we do not understand what absolute guarantees we can make about obfuscation itself, or even what sort of security current iO constructions may give us. The “right” definition for program obfuscation is far from resolved, and researchers have begun to investigate stronger notions—such as differing inputs obfuscation¹—that also seem to escape the impossibility of VBB.

Also, while VBB obfuscation is impossible in general, it remains possible that a broad class of programs, including many “real-world” programs, can be VBB obfuscated. But what is this class of programs?

The reason that VBB obfuscation remains appealing as a notion, despite its problems, is that we have a long list of glorious applications we could build in a world—call it “Obfutopia”—where general VBB program obfuscation is somehow possible. For example, a classic motivation for obfuscation was to build secure autonomous mobile agents that perform transactions on your behalf to your specifications using your secret signing key. The agent may run on untrusted platforms that could try to “torture” it to extract your secrets. While VBB obfuscation would solve this problem, trying to prove that this is secure with a weaker notion of obfuscation is a challenge.

As a more speculative application, suppose that in the distant future we want to upload our minds to cyberspace. Assuming this happens, we might as well do it securely. Can a mind programmed in software have secrets, innermost thoughts? With iO, we can provide some assurance: if a mind can compartmentalize its secrets so that they do not affect its behavior, the secret can remain hidden. But although digital minds presumably would have impressive powers of compartmentalization, it would be more convenient if obfuscation’s assurances were less restrictive.

Like all utopias, Obfutopia has echoes of dystopia. Just as access to secure encryption makes it easier for terrorists to hide their plans, obfuscation can be used for nefarious purposes—for example, by malware designers to hide malicious content. Privacy-enhancing technologies often come with a cost. However, given the ever-increasing list of positive applications of secure obfuscation, we argue that the potential benefits outweigh the potential costs.

2. Obfuscation Basics

Barak et al.¹ formally define an obfuscator as a probabilistic (randomized) algorithm O that takes a program P as input and meets the following three conditions:

Functional equivalence: For every program P, the string O(P) describes a program with the same input-output behavior as P.
Polynomial slowdown: The description length and running time of O(P) are polynomially related to P.
Unintelligibility.

We will define the security of obfuscation (“unintelligibility”) in Section 2.2. For now, let us focus on obfuscation’s functional aspects.

2.1. What is a program?

To make the definition of program obfuscation meaningful, we first need to define what we mean by a “program.” Obviously, we do not want a definition of programs that is language-dependent, so we use theoretical models of computation. In this article, we will use Boolean circuits as our model of computation.

A Boolean circuit is a composition of Boolean gates, such as AND, OR, and NOT gates. The gates are typically arranged into levels, so that the outputs of gates at level i are inputs to gates at level i + 1 unless i is the last level of the circuit. A circuit does not contain any loops; it is a directed acyclic graph. The number of gates is called the size of the circuit, and the number of levels is called the depth. For a circuit C, we denote by |C| the size of C.

In a circuit, all loops are unrolled and all branches are evaluated. Unlike a RAM program, which may touch only input that is relevant for a particular execution, a circuit ingests all the input at its input level. Consequently, translating some RAM programs into circuits may add a lot (but still polynomial amount) of overhead. We will ignore this issue here, as efficiency (beyond achieving polynomial overhead) is not our concern here.

An NC¹ family of circuits is one where the depth of each circuit in the family is at most logarithmic in its size.

With these preliminaries, we can make precise the functionality and polynomial slowdown requirements for obfuscation. Namely,

Functional equivalence: For every Boolean circuit C, O(C) describes a circuit C‘ such that for all inputs x, we have C(x) = C‘(x).
Polynomial slowdown: There exists a polynomial p such that for all Boolean circuits C, the size of the circuit described by O(C) is at most p(|C|).

2.2. What does “unintelligible” mean?

As mentioned in the Introduction, we take indistinguishability obfuscation (iO)¹ as our notion of unintelligibility. In the formal definition of iO, security is modeled as a game between a challenger and an adversary. The adversary picks two circuits P₀ and P₁ that are the same size (padding one of the circuits if necessary) and that have the same input-output behavior but which may differ internally. It gives these programs to the challenger. The challenger obfuscates one of them, producing an obfuscation O(P_b) for random b ∈ {0, 1}, which it gives to the adversary. Finally, the adversary tries to guess b. The obfuscation is considered secure if no efficient adversary can win this game with probability non-negligibly greater than 1/2.

In principle, one way to iO-obfuscate would be to canonicalize P_b by outputting the lexicographically first program with the same input-output behavior as P_b, in which case O(P₀) = O(P₁) and the adversary cannot win with probability greater than 1/2. But true canonicalization is inefficient in general unless P = NP, so we must settle for pseudo-canonicalization and require that the distributions of O(P₀) and O(P₁) are computationally indistinguishable to efficient adversaries, in the sense described above: no efficient adversary should be able to guess the bit b with probability non-negligibly greater than 1/2.

As mentioned in the Introduction, the work of Goldwasser and Rothblum¹⁹ showed that an (efficient) indistinguishability obfuscator is a “best-possible” obfuscation: it is as good as any other obfuscation of roughly the same complexity.¹⁹ The proof is simple: Suppose BO(·) is the actual best obfuscator of a certain class of programs (modeled as boolean circuits), while Pad(·) merely increases the size of the circuits the same amount as BO(·). Then, for any circuit C, the circuits BO(C) and Pad(C) are the same size and functionality equivalent. Therefore, if iO is an indistinguishability obfuscator, iO(BO(C)) and iO(Pad(C)) are indistinguishable to efficient adversaries. Since they are indistinguishable, iO(Pad(C)) obfuscates C as well as iO(BO(C)), which obfuscates C as well as BO(C). Unfortunately, however, “best possible” is only a relative guarantee that does not directly imply any secrets in the obfuscated program remain hidden.

Another justification for iO is that it turns out to be surprisingly useful. In Section 4, we add to the applications of iO mentioned in the Introduction.

3. Our Core Obfuscator

Recall that our obfuscation scheme¹¹ works in two steps. In the Introduction, we already sketched how to bootstrap obfuscation for NC¹ circuits to obfuscation for general circuits. Now we will sketch how our “core” obfuscator obfuscates NC¹ circuits. Our core obfuscator builds on a new cryptographic tool called multilinear maps,^{7, 10, 15} which we review below.

3.1. Cryptographic multilinear maps

As a first approximation, one can say that a cryptographic multilinear map system encodes a value x ∈ ℤ_p (where p is some large prime) by encrypting it using a FHE scheme. The multilinear map system inherits the homomorphism of FHE: given encodings of x and y, one can compute encodings of x + y and x · y. But the multilinear map system comes with additional constraints, and additional functionality. Each encoding is associated to some level i ≤ k for parameter k. Encodings can only be added if they are at the same level: Enc_i(x) ⊕ Enc_i(y) → Enc_i(x + y). Encodings can be multiplied, subject to restrictions, and the levels are added: Enc_i(x) ⊗ Enc_j(y) → Enc_i+j.(x · y) if i + j ≤ k but is meaningless when i + j > k. Unlike FHE, the multilinear map system comes equipped with a zero test: the system gives an efficient procedure to test whether an encoding at level k encodes the value 0.

Thus, unlike an encryption scheme, the multilinear map system leaks some information about what is encoded (whether the value is zero). Roughly, we carefully construct our core obfuscator so that the leakage of the multilinear map system is positioned precisely to become the leakage of the overall obfuscated program output.

3.2. Our core obfuscator for NC¹ circuits

We now sketch how our “core” obfuscation scheme for NC¹ circuits (circuits of logarithmic depth) works. We confess that many details will be lacking in this short article; see Garg et al.¹¹ for more details. Instead, we will try to give you a glimpse into the main insights that make our approach work, and a very rough outline of how these insights can be extended to yield an actual obfuscation.

Barrington: A more friendly mathematical representation of NC¹ circuits. Boolean circuits are already an unrealistic representation of general computer programs. Few of us would write code as a Boolean circuit. But to get where we want to go, we move to a “programming language” that is even further removed from reality, but that does not lose any generality. We are not doing this for kicks, though. Our objective is to represent our NC¹ circuit in a way that plays nicely with our cryptographic multilinear maps and allows strong randomization.

The specific programming language that we choose is a variant of a branching program considered by Barrington,² that we call a Matrix Branching Program. In a matrix branching program, the program is described by a number n, and a sequence of pairs of full-rank square matrices

The number n, which must be at most k, describes the length of inputs that the program can take as input. Given an input x ∈ {0, 1}ⁿ, to “run” the program M on x, we compute the matrix product:

where here x[j] denotes the jth bit of x, where j can range from 0 to n – 1.

A matrix branching program is valid if this product always yields either the identity matrix I, or some other fixed matrix B ≠ I. If the product is I, then we say that the output of the program M(x) is 1. If the product is B, then we say that the output of the program M(x) is 0.

This representation is useful because Barrington showed that any NC¹ circuit can be efficiently re-expressed as a polynomial-size matrix branching program. Also, matrix product is a multilinear operation: every entry in each matrix occurs exactly once in every monomial of the product, so that we can use the multilinear map system to encode every entry of each matrix, and compute the product correctly using the multilinear map operations described above. Lastly, as we describe next, matrix branching programs can be globally randomized in a strong sense.

Kilian: Global Randomization of Branching Programs. The entire global randomization that we actually use to obtain our obfuscation, and why it works, is frankly too complex to describe at length here. Instead, we will ask you to suspend disbelief as we describe how to obfuscate a special case of programs: namely, programs that have no input at all. Now, the thoughtful reader will complain that this special case is utterly trivial to obfuscate, and indeed it is: One could simply pre-compute the output of the program, and provide this output as the obfuscation. This would absolutely be a secure obfuscation—although this is not what we are going to do. We ask you to put this thought aside. We will sketch a different way to deal with this special case, and we assure you that our method for dealing with this special case will provide insight into why our actual global randomization works in the general case.

What does a matrix branching program with no input look like? Instead of a sequence of pairs of matrices, it would just be a sequence of matrices:

and its output would be the product

How can we globally randomize these matrices, so that the output is preserved, but the “steps” of the computation reveal nothing more than the output? The answer is to apply a technique of Kilian,²¹ as follows: Choose completely random invertible matrices R₁, …, R_k_-1 over Z_p, and set R₀ = R_k = I. Now, define

Then, we can observe that

However, a simple inductive argument shows that the entire collection of matrices contain no information beyond the product Π_i M_i. For example, suppose k = 2. In other words, we only have two matrices and . Then, by letting Q = M₂⁻¹ R⁻¹ and algebraic rearrangement, we see that equivalently, we could set and . Note that if R is a random invertible matrix, then so is Q. Thus, and can be created knowing only the product M₁M₂, and nothing else about the original “steps” of the computation M₁ and M₂. This proof generalizes to any k.

This insight shows that it is in fact possible to globally randomize a program, using only local randomizations with a set of correlated random variables, in a way that preserves the output, but completely randomizes the individual steps.

Our actual core obfuscator—which must deal with an exponential number of potential inputs—begins with this idea. In essence, this idea is used to argue security on an input-by-input basis. Additional global randomizations are invoked to thwart other algebraic attacks that can arise when inputs are present (e.g., input-mixing attacks, where part of the computation is done with respect to one input, and another part with respect to another input). But the main idea remains the same: to globally randomize the program in a way that preserves the output, but hides all other useful information about the original individual steps of the program and how they combined with each other.

4. Applications

Indistinguishability obfuscation has enabled a wide variety of exciting applications that has become so vast that it is impossible to do them justice here. Here, we cover just a few applications to help the reader appreciate the possibilities.

4.1. Witness encryption

Traditionally, encryption is a tool for conveying confidential messages to specific recipients that have secret keys.

Departing from tradition, suppose we would like to encrypt a message not to a fixed recipient who has a secret key, but to anyone who knows a proof for the Riemann Hypothesis (or, more generally, a witness for some NP statement). We want any recipient that has any proof (of some bounded size) of the RH to be able to decrypt the message. We also want to generate this encryption without knowing ourselves whether the RH is true or not. Regarding security, we want that if the RH is false, then the ciphertext should hide the encrypted message. This problem of witness encryption¹² (WE) was first posed by Rudich in 1987.

More formally, given an NP Language L, a WE scheme for L is an encryption scheme that takes as input an instance x and a message bit b, and outputs a ciphertext c. If x ∈ L and w is a valid witness for x, then a decryptor can use w to decrypt c and recover b. However, if x ∉ L, then an encryption of 0 should be computationally indistinguishable from an encryption of 1.

Using iO, we can easily construct a WE scheme for an NP-Complete language such as SATISFIABILITY. Specifically, consider the circuit C_x,b(w) (with x and b hard-coded in it) defined as follows: on input w, C_x,b outputs b if w is a valid witness for x ∉ L, and otherwise just outputs ⊥ (indicating “undefined”). To encrypt b, the encrypter simply outputs an iO obfuscation of C_x,b. With a legitimate witness w for x ∈ L, anyone can evaluate the obfuscated circuit to recover b. On the other hand, if x ∉ L, then both C_x,₀ and C_x_,1 are constant functions that always output ⊥, and iO guarantees that the obfuscation of these equivalent functions are computationally indistinguishable.

4.2. Punctured programming: secret-key to public-key encryption

Diffie and Hellman⁸ already described some of the potential of obfuscation back in 1976 in their “New Directions” paper. They noticed that obfuscation could potentially be used to transform a secret key encryption scheme (where sender and receiver use the same secret key) into a public key encryption (PKE) scheme (where the sender uses a “public” key for encryption and only the receiver knows the secret key for decryption). Specifically, take the encryption procedure or program, with the secret key embedded inside it, and publish its obfuscation as the public key. Senders could use this public obfuscated program to encrypt messages, but (hopefully) not to decrypt ciphertexts, as the secret key remains hidden inside the obfuscated program.

Given ideal black box obfuscation, this technique works well for some secret key encryption schemes. (Beware that it fails completely for some others, like stream ciphers, where encryption is an involution.) Can we make it work for iO? Indeed, we can: in fact, we have two quite different ways to construct public-key encryption from iO and any one-way function (OWF).

The first technique combines WE with a length-doubling pseudorandom generator (PRG) G. Such a PRG can be built from any OWF, and has the property that G(x) ∈ {0, 1}²ⁿ is computationally indistinguishable from a random string when x ∈ {0, 1}ⁿ is sampled uniformly. In the PKE scheme, the secret key is a uniform w ∈ {0, 1}ⁿ, and the public key is W = G(w). To encrypt a bit b, the sender generates a witness encryption of b that can be decrypted by any “witness” w‘ such that W = G(w’). The legitimate decrypter can decrypt using w. Security follows from three observations: (1) By the security of the PRG, an adversary cannot distinguish whether W is a uniform 2n-bit string, and (2) If W were a uniform 2n-bit string, then (with high probability) it is not in the image of the PRG, and thus there is no valid witness for W, and (3) If there is no valid witness, the security of WE guarantees that b remains hidden.

Sahai and Waters²⁶ found a way of building PKE from iO that is quite close to what Diffie and Hellman had in mind. To do so, they introduced a general technique called punctured programming, that has turned out to have many applications. The high-level idea is “to alter a program (which is to be obfuscated) by surgically removing a key element of the program, without which the adversary cannot win the security game it must play, but in a way that does not alter the functionality of the program.”²⁶ This is perhaps best explained by jumping into the technical details. The PKE scheme uses a length-doubling pseudorandom generator (PRG) G, as described above. The scheme also uses a pseudorandom function (PRF) F, which takes a key and a 2n-bit string, and outputs a bit that is computationally indistinguishable from uniform. However, we want our PRF to have a special property: we want it to be “puncturable.” A puncturable PRF has the property that for any key K and input , there is a compact punctured key that allows one to evaluate F(K, x) for any x ≠ , but keeps the value of F(K, ) pseudorandom. (Puncturable PRFs can also be built from any OWF.) Now, consider a randomized secret key encryption scheme, where the secret key is a key K for the PRF, one encrypts m with randomness r as Enc_K(r, m) = (G(r), F(K, G(r)) ⊕ m), and the ciphertext is decrypted in the obvious way using K. Sahai and Waters show that they can use iO to obfuscate the encryption function while leaving decryption unchanged. In this PKE scheme, decryption amounts to evaluating the (puncturable) PRF. No other PKE scheme has such a fast decryption algorithm!

To prove security, Sahai and Waters use their punctured program technique. For any , we define a punctured encryption , which uses the punctured key to evaluate the PRF except at , where it outputs ⊥ (indicating “undefined”). Sahai and Waters observe that the encryption function will never evaluate the PRF on any outside the range of the PRG. Therefore, for any such , the original encryption function has the same functionality as , and iO will make them indistinguishable. Now, by the security of the PRG, the adversary cannot distinguish a legitimate ciphertext from a fake ciphertext ( , F(K, ) ⊕ m) where is chosen uniformly, and is therefore with high probability outside the range of the PRG. But since the punctured public key (the obfuscation of ) contains no usable information about F(K, ) (even the secret key inside provides no help), m appears pseudorandom.

4.3. Functional encryption

Consider a hospital administrator who has access to sensitive patient medical information—specifically medical history and gene sequences of the patients. In the same hospital a medical researcher is working on establishing a link between a specific set of genes and a type of cancer. In order to validate his findings the researcher needs access to the medical records of the patients in the hospital. The hospital administrator would like to help the researcher but is legally obliged to keep patient records private. In such scenarios, patient data is typically collected before the queries that the medical researcher might be interested in are known. Functional encryption (FE)^{3, 25} enables an elegant solution to this problem. The administrator could encrypt the data as it is created and later provide a specialized secret key to the medical researcher allowing him to learn only the output of his analysis algorithm on the patient data, and nothing more.

More formally, in FE, ciphertexts encrypt inputs x and keys are issued for functions f. The striking feature of this system is that using the key SK_f to decrypt a ciphertext Enc(x), yields the value f(x) but does not reveal anything else about x. Furthermore, any arbitrary collusion of key holders relative to many functions f_i does not yield any more information about x beyond what is “naturally revealed” by the functionality (i.e., f_i(x) for all i). More precisely, the security definition of FE requires that the adversary cannot distinguish between encryptions of x₀ and x₁, even when it is provided with many secret keys SK_f for which f(x₀) = f(x₁). Note that keys not satisfying this constraint allow the adversary to trivially distinguish between encryptions of x₀ and x₁.

We next sketch our construction of a FE for all circuits¹¹ from iO. In order to build intuition we start with a solution assuming ideal black box obfuscation. We start by selecting (PK, SK), a public-secret key pair for a regular public-key encryption scheme. An encryption of a value x under the FE scheme will simply be an encryption of x using the public key PK. The secret key SK_f corresponding to a circuit f is an obfuscation of a circuit that takes a ciphertext c as input, uses SK (hardcoded inside the obfuscated circuit) to decrypt c obtaining x, and then outputs f(x). Intuitively the obfuscation will hide the secret key SK, achieving the desired security.

To adapt the above construction to iO, we use the two-key shell game described in the Introduction. In particular, we set the public key for our FE scheme to be two public keys PK₀ and PK₁. An encryption of x consists of encryptions of x under both keys, together with a special zero knowledge proof that the two encryptions encrypt the same value. The secret key SK_f is an obfuscated program that checks the proof, uses one of the secret keys to obtain x, and then outputs f(x).

As in the shell game in the Introduction, if the proof is true, then the output of SK_f is the same whether it uses SK₀ or SK₁, and the strategy is to use iO to argue that these cases are indistinguishable. However, ultimately we need to argue that the adversary cannot distinguish an encryption of x₀ from an encryption of x₁, and for this our proof needs to use an intermediate hybrid experiment where a “fake” simulated cryptographic proof is generated, and the two ciphertexts are actually encryptions of x₀ and x₁ respectively. But here we encounter a new problem: if it is possible to generate “fake” cryptographic proofs that “prove” the ciphertexts encrypt the same value when they do not, then we cannot use iO to transition from SK₀ to SK₁ as described above.

We address this problem by introducing the notion of statistically simulation sound cryptographic proofs, which require that valid proofs only exist for true statements, with exactly one exception: there is a single false statement Ψ* such that there exists a valid proof π* for Ψ*. However, for every other statement Ψ, if a valid proof π exists for Ψ, then Ψ must be true. We show how to achieve statistical simulation soundness using witness-indistinguishable proofs.⁹ Once we have it, then a similar two-key alternation argument to the one given in the Introduction allows us to establish security of our FE scheme. We note that full collusion-resistance follows from a standard hybrid argument using the iO definition for each secret key given out, one at a time.

4.4. Restricted-use software

Here we discuss an application of iO to a question that does not deal with encryption or authentication, but rather restricted-use software.

Software developers will often want to release a demo or restricted use version of their software that limits the features that are available in a full version. In some cases a commercial software developer will do this to demonstrate their product; in other cases the developer will want to make multiple tiers of a product with different price points. In other domains, the software might be given to a partner that is only partially trusted and the developer only wants to release the features needed for the task.

Ideally, a developer could create a downgraded version of software simply by starting with the full version and then turning off certain features at the interface level—requiring minimal additional effort. However, if this is all that is done, it could be easy for an attacker to bypass these controls and gain access to the full version or the code behind it. The other alternative is for a software development team to carefully excise all unused functionality from the core of the software. Removing functionality can become a very time consuming task that could itself lead to the introduction of software bugs. In addition, in many applications it might be unclear what code can and cannot remain for a restricted use version. For instance, a subroutine could be written in a highly general form, where only a subset of its functionality is needed for several applications. In this case, the subroutine would have to be re-designed to “cripple” it, a time-consuming and potentially difficult task.

iO provides an immediate solution: the developer can restrict the use at the interface level and then release an obfuscated version of the program. For this application iO suffices, since by definition a version restricted in the interface is indistinguishable from an obfuscated program with equivalent behavior that has its parts removed or redesigned at the start.

5. Conclusion

We described a candidate indistinguishability obfuscation scheme, and some of the exciting applications it makes possible. However, a lot remains to be done—to base obfuscation on well-established assumptions, to build a practical scheme, to truly understand what obfuscation means—before cryptographic obfuscation can move from being a proof of concept to a widely-deployed privacy enhancing technology.

Submit an Article to CACM

CACM welcomes unsolicited submissions on topics of relevance and value to the computing community.

You Just Read

Hiding Secrets in Software: A Cryptographic Approach to Program Obfuscation

View in the ACM Digital Library

Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and full citation on the first page. Copyright for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, to republish, to post on servers, or to redistribute to lists, requires prior specific permission and/or fee. Request permission to publish from permissions@acm.org or fax (212) 869-0481.

DOI

10.1145/2902313

May 2016 Issue

Published: May 1, 2016

Vol. 59 No. 5

Pages: 113-120

Table of Contents

Join the Discussion (0)

Become a Member or Sign In to Post a Comment

The Latest from CACM

Explore More

BLOG@CACM Aug 30 2024

Everything You Always Wanted to Know About PCs, But Were Afraid to Ask

Saurabh Bagchi

Computing Profession

individuals at a conference table, illustration

News Aug 30 2024

How CrowdStrike Stopped Everything

David Geer

Security and Privacy

BLOG@CACM Aug 29 2024

Leveraging Computational Thinking in the Era of Generative AI

Yael Erez, Koby Mike, and Orit Hazzan

Education

Shape the Future of Computing

ACM encourages its members to take a direct hand in shaping the future of the association. There are more ways than ever to get involved.

Get Involved

Communications of the ACM (CACM) is now a fully Open Access publication.

By opening CACM to the world, we hope to increase engagement among the broader computer science community and encourage non-members to discover the rich resources ACM has to offer.

Learn More

1. Introduction

2. Obfuscation Basics

3. Our Core Obfuscator

4. Applications

5. Conclusion

Hiding Secrets in Software: A Cryptographic Approach to Program Obfuscation

DOI

May 2016 Issue

Related Reading

Join the Discussion (0)

Become a Member or Sign In to Post a Comment

Shape the Future of Computing

Communications of the ACM (CACM) is now a fully Open Access publication.