Research and Advances
Security and Privacy

Indistinguishability Obfuscation from Well-Founded Assumptions

We examine a formalization of the “one-way compiler" concept with the notion of indistinguishability obfuscation.

Posted
binary code amid bursts of light
Read the related Technical Perspective

Abstract

At least since the initial public proposal of public-key cryptography based on computational hardness conjectures, cryptographers have contemplated the possibility of a “one-way compiler” that translates computer programs into “incomprehensible” but equivalent forms. And yet, the search for such a “one-way compiler” remained elusive for decades. We examine a formalization of this concept with the notion of indistinguishability obfuscation (i𝒪). Roughly speaking, i𝒪 requires that the compiled versions of any two equivalent programs (with the same size and running time) be indistinguishable from any efficient adversary. Finally, we show how to construct i𝒪 in such a way that we can prove the security of our i𝒪 scheme based on well-studied computational hardness conjectures in cryptography.

 

1. Introduction

Consider the polynomial f1(x,y)ɛℤ[x,y] that is computed as follows:

f 1 ( x , y ) = ( x + y ) 16 ( x y ) 16

Alternatively, contemplate the polynomial f2(x,y)ɛℤ[x,y] that is computed via:

f 2 ( x , y ) = 32 x 15 y + 1120 x 13 y 3 + 8736 x 11 y 5 + 22880 x 9 y 7 + 22880 x 7 y 9 + 8736 x 5 y 11 + 1120 x 3 y 13 + 32 x y 15

A calculation shows that f1 and f2 are in fact the same polynomial, computed in two different ways. Indeed, the expressions f1 and f2 above are special cases of arithmetic circuits, which precisely represent “ways to compute a polynomial.”

What if we wanted to hide all implementation choices made when creating such an arithmetic circuit for a particular polynomial? An easy way to do that would be to first convert our polynomial into a canonical form and then implement the canonical form as an arithmetic circuit. Indeed, the description of f2 above can be seen as a canonical representation of the polynomial as a sum of monomials with regard to a natural monomial ordering. However, as this example illustrates, canonical forms can be substantially more complex than other implementations of the same polynomial. For polynomials in n variables, the loss in efficiency can be exponential in n. This would often make computing the canonical form – or indeed, even writing it down – infeasible.

1.1. A pseudo-canonical form

Given that computing, canonical forms can be infeasible, what is there to do? Here, following,7 we draw an analogy to the notion of pseudo-randomness. When truly random values are not available, we can instead aim to produce values that “look random” by means of a pseudo-random generator. That is, we require that no efficient algorithm can distinguish between truly random values and the output of our pseudo-random generator.

Now, for two arithmetic circuits g1 and g2 that compute the same underlying polynomial, a true canonical form Canonical(g1) would be identical to the canonical form of Canonical(g2). Instead, we would ask that a pseudo-canonical form PseudoCanonical(g1) would simply be indistinguishable from the pseudo-canonical form PseudoCanonical(g2), to all efficient algorithms that were given g1 and g2 as well. Observe that unless there are actual efficiently computable canonical forms for all arithmetic circuits – which we do not believe to be true – it must be that such a PseudoCanonical operator is randomized, and outputs a probability distribution over arithmetic circuits computing the same polynomial.

1.2. The computing lens

The classic theory of computation tells us that general computer programs can be converted into equivalent polynomials (albeit over finite fields, which we will focus on implicitly in the sequel). So the pseudo-canonicalization question posed above is equivalent to the pseudo-canonicalization question for general computer programs. Indeed, the question of hiding implementation details within a computer program has a long history, dating at least as far back as the groundbreaking 1976 work of Diffie and Hellman13 introducing the concept of public-key cryptography. Historically, this problem has been called “program obfuscation,” albeit it was typically discussed in an ill-defined form. Discussed in these vague terms, it was folklore that truly secure program obfuscation would have revolutionary applications to computing, especially for securing intellectual property. The work of Barak et al.7 gave a formal treatment of this problem and proved the impossibility of strong forms of generalpurpose program obfuscation. This work also formalized the pseudo-canonicalization problem discussed above via the notion of indistinguishability obfuscation (i𝒪). Writing now in the language of Boolean circuits, we define the problem below:

Definition 1. (Indistinguishability Obfuscator (iO) for Circuits.7)

A probabilistic polynomial-time algorithm (i𝒪) is called a secure indistinguishability obfuscator for polynomial-sized circuits if the following holds:

  • Completeness: For every ⋋ɛℕ every circuit C with input length n, every input x ɛ {0,1}n we have that Pr[C˜(x)=C(x):C˜iO(1λ,C)]=1.

  • Indistinguishability: For every two ensembles {C0,λ}λZ+ and {C1,λ}λZ+ of polynomial-sized circuits that have the same size, input length, and output length, and are functionally equivalent, that is, ∀ ⋋ɛℤ+, C0,⋋(x) = C1,⋋(x) for every input x, the distributions i𝒪(1, C0,⋋) and are computationally indistinguishable, denoted as:

    [ i O ( 1 λ , C 0 , λ ) } c { i O ( 1 λ , C 1 , λ ) } .

    It means or every efficient polynomial-time algorithm D, for every constant c > 0, there exists a constant ⋋0ɛℤ+, such that for all ⋋ > ⋋0, we have: |Pr[D(iO(1λ,C0,λ))=1]Pr[D(iO(1λ,C1,λ))=1]|1λc

As we discuss in Section 1.4, indeed i𝒪 as a formalization of pseudo-canonicalization lived up to the folklore promise of software obfuscation: there was, and still is, a large research community studying novel applications of i𝒪.

In contrast, demonstrating the feasibility of constructing i𝒪 proved far more challenging. Often one expects that theory will lag behind the practice, and given the folklore promise of software obfuscation, one might expect that over the years perhaps clever programmers had come up with heuristic approaches to software obfuscation that resisted the attack. The reality is the opposite. Indeed, in 2021 the third annual White Box Cryptography contest was held to evaluate heuristic methods for software obfuscation, and every one of the 97 submitted obfuscations was broken before the contest ended.1

A large body of theoretical work, starting with the pioneering work of Garg et al.14 has attempted to construct i𝒪 using mathematical tools. However, before the result18 by the authors of this article, all previous mathematical approaches to constructing i𝒪 relied on new, unproven mathematical assumptions, many of which turned out to be false (see Jain et al.18 for a list of prior constructions.)

We would like to build i𝒪 whose security rests upon cryptographic hardness assumptions that have stood the test of time, have a long history of study, and are widely believed to be true. The main result of our works18,19 is the construction of an i𝒪 scheme from three well-studied assumptions. We discuss this in more detail next.

Theorem 1

Under the following assumptions18,19,a:

  • the Learning Parity with Noise (LPN) assumption over general prime fields ℤp with polynomially many LPN samples and error rate 1/kδ, where k is the dimension of the LPN secret, and δ > 0 is any constant;

  • the existence of a Boolean Pseudo-Random Generator ( PRG) in NC0 with stretch n1+τ, where n is the length of the PRG seed, and τ > 0 is any constant;

  • the Decision Linear (DLIN) assumption on symmetric bilinear groups of prime order.

Indistinguishability obfuscation for all polynomial-size circuits exists.

The three assumptions above (discussed further in Section 1.3) are based on computational problems with a long history of study, rooted in complexity, coding, and number theory. Further, they were introduced for building basic cryptographic primitives (such as public key encryption), and have been used for realizing a variety of cryptographic goals that have nothing to do with i𝒪.

1.3. Assumptions in more detail

We now describe each of the assumptions we need in more detail and briefly survey their history.

TheDLINAssumption: The Decisional Linear assumption (DLIN) is stated as follows: For an appropriate -bit prime p, two groups 𝔾 and 𝔾T are chosen of order p such that there exists an efficiently computable nontrivial symmetric bilinear map e:𝔾×𝔾→𝔾T. A canonical generator g for 𝔾 is also computed. Following the tradition of cryptography, we describe the groups above using multiplicative notation, even though they are cyclic. The DLIN assumption requires that the following computational indistinguishability holds:

{ ( g x , g y , g x r , g y s , g r + s ) | x , y , r , s Z p } c { ( g X , g y , g x r , g y s , g z ) | x , y , r , s , z Z p }

This assumption was first introduced in the 2004 work of Boneh et al.10 and instantiated using appropriate elliptic curves. Since then DLIN and assumptions implied by DLIN have seen extensive use in a wide variety of applications throughout cryptography, such as Identity-Based Encryption, Attribute-Based Encryption, Functional Encryption for degree 2 polynomials, Non-Interactive Zero-Knowledge, etc.

The Existence ofPRGs inNC0: The assumption of the existence of a Boolean Pseudo-Random Generator PRG in NC0 states that there exists a Boolean function G:{0,1}n→{0,1}m where m = n1+τ for some constant τ > 0, and where each output bit computed by G depends on a constant number of input bits, such that the following computational indistinguishability holds:

{ G ( σ ) | σ { 0 , 1 } n } c { y | y { 0 , 1 } m ]

Pseudorandom generators are a fundamental primitive in their own right and have vast applications throughout cryptography. PRGs in NC0 are tightly connected to the fundamental topic of Constraint Satisfaction Problems (CSPs) in complexity theory and were first proposed for cryptographic use by Goldreich16 20 years ago. The complexity theory and cryptography communities have jointly developed a rich body of literature on the cryptanalysis and theory of constant-locality Boolean PRGs.

LPNover large fields: The Learning Parity with Noise LPN assumption over finite fields ℤp is a decoding problem. The standard LPN assumption with respect to subexponential size modulus p, dimension ℓ, sample complexity n, and a noise rate r = 1/ℓδ for some δɛ(0,1), states that the following computational indistinguishability holds:

{ [ A , s A + e m o d p | A Z p × n , s Z p 1 × , e D r 1 × } c { A , u | A Z p × n , u Z p 1 × n }

Above e ← 𝒟r is a generalized Bernoulli distribution, i.e. e is sampled randomly from ℤp with probability 1/ℓδ and set to be 0 with probability 1 − 1/ℓδ. We consider polynomial sample complexity n(ℓ), and the modulus p is an arbitrary subexponential function in ℓ.

The origins of the LPN assumption date all the way back to the 1950s (e.g. Gilbert15): it has long been known that random linear codes possessed remarkably strong minimum distance properties. However, since then, very little progress has been made inefficiently decoding random linear codes under random errors. The LPN over fields assumption above formalizes this, and was introduced over ℤ2 for cryptographic uses in 1994,9 and formally defined for general finite fields and parameters in 2009,17 under the name “Assumption 2.”

While in Ishai et al.17 the assumption was used when the error rate was assumed to be a constant, in fact, polynomially low error (in fact δ= 1/2) has an even longer history in the LPN literature: it was used by Alekhnovitch2 to construct public-key encryption with the field ℤ2, and used to build public-key encryption over ℤp in 2015.5 The exact parameter settings that we describe above, with both general fields and inverse polynomial error rate corresponding to an arbitrarily small constant δ > 0 was explicitly posed by Boyle,11 in the context of building efficient secure two-party and multi-party protocols for arithmetic computations.

Recently, the LPN assumption has led to a wide variety of applications. A comprehensive review of known attacks on LPN over large fields, for the parameter settings we are interested in, was given in Boyle.11,12 For our parameter setting, the running time of all known attacks is Ω(21δ), for any choice of the constant δɛ(0,1) and for any polynomial number of samples n(ℓ).

Finally, we mention that except for the DLIN assumption, the other two assumptions, LPN and PRG in NC0, have search-to-decision reductions.

1.4. Applications of i𝒪

The notion of i𝒪 occupies an intriguing and influential position in complexity theory and cryptography. Interestingly, if NPBPP then i𝒪 exists for the class of all polynomial size circuits, because if NPBPP, then it is possible to efficiently compute a canonical form for any function computable by a polynomial-size circuit. This is in contrast to most powerful cryptographic objects, which typically imply the existence of one-way functions. A large body of work has shown that i𝒪 plus one-way functions imply a vast array of cryptographic objects, so much so that iO has been conjectured to be a “central hub”24 for cryptography. In addition, an impressive list of fascinating new cryptographic objects is only known under i𝒪 or related objects.

We highlight a small subset of these implications (see Jain et al.18 and the references therein): multiparty non-interactive key exchange in the plain model, zero-knowledge succinct noninteractive arguments (ZK-SNARGs) for any NP language, multilinear maps with bounded polynomial multilinear degrees, witness encryption for any NP language, functional encryption supporting all polynomial-sized circuits, and fully homomorphic encryption schemes (FHE) for unbounded-depth polynomial-size circuits.

1.5. Open problems

Our workplaces i𝒪 on firm foundations with respect to the assumptions is based on, thereby answering the main feasibility question for the primitive (until we resolve the P vs. NP question). However, there are many important open questions that remain to be answered. One of the main questions is whether there is a construction that is very simple to describe and efficient to implement in practice. Our construction is quite complex, relies on a number of steps, and is recursive in nature. As a consequence, the actual asymptotic complexity is some large polynomial. Simplifying and finding more efficient paths to building i𝒪 from well-studied assumptions is one of the most important problems that must be solved, toward the vision of using i𝒪 as a universal tool.

2. Overview: How to Construct i𝒪?

In the remaining of the article, we describe a very high level overview of the main technical ideas implying i𝒪. For simplicity of exposition, we choose the simplest path to i𝒪 that we are aware of. This overview is based on a combination of ideas from Jain et al.,18,19 which, however, would require one additional assumption to the three stated above (See Theorem 1) – namely, the Learning With Errors (LWE)23 assumption. Since the exact technical reasons for needing LWE is not important for this overview, and this assumption is actually unnecessary,19 we do not discuss it in detail here.

2.1. Preliminaries

Let’s start with introducing some basic notation. Let size(X) indicates the length of the binary description of an object X (e.g., a string, a circuit, or truth table). Throughout, we consider Boolean functions or circuits or algorithms mapping n-bit binary strings to m-bit binary strings, for some n, m ɛℤ+. Let time(A, x) denotes the running time of an algorithm (or circuit) A on an input x (in the case of a circuit C, time(C, x) is the same as size(C)). We say that an algorithm or circuit is efficient if its running time is bounded by a (fixed) polynomial in the length of the input, that is, time(C, x) = size(x)c for some positive integer cɛℤ+. When we only care about the existence of a constant and the concrete value is not important, we write O(1) in place of that constant, e.g., time(C, x) = size(x)O(1) (following the big-O notation in complexity theory).

Restating our goal, we want to design an efficient randomized algorithm, called the obfuscator 𝒪, that given a Boolean circuit C: {0, 1}n → {0, 1} with size snO(1), referred to as the original circuit, outputs another Boolean circuit Ĉ:{0,1}n → {0,1}, called the obfuscated circuit, such that the following three properties hold:

  • Correctness: the obfuscated circuit Ĉ is functionally equivalent to the original circuit C, denoted as Ĉ ≡, meaning that for every xɛ{0,1}n, Ĉ(x) = C(x) Correctness must hold no matter what random coins the obfuscator 𝒪 used.

  • Efficiency: the obfuscator is efficient, meaning 𝒪 runs in time polynomial in the size of the original circuit, namely, time(𝒪, C)=size(C)o(1)

  • Security: the obfuscated circuit Ĉ hides the implementation details in the original circuit C. That is, for every two, equally-sized and functionally-equivalent circuits C0 and C1 (i.e., size(C0) = size(C1) and C0C1), the obfuscated circuits Ĉ0 and Ĉ1 are computationally hard to distinguish (as defined in Definition 1; see also Jain et al.18).

Next, we give an informal overview of how to construct i𝒪 from well-studied assumptions.

2.2. Simplifying the task of i𝒪

Perhaps the simplest starting point is the following: If there is no restriction on the time the obfuscator 𝒪 can take, then there is an extremely intuitive obfuscator: the obfuscated circuit is the truth table of the original circuit. The truth table TTC of a Boolean circuit C is an array indexed by inputs, where TTC[x] = C(x). It can be computed in time 2n · s if the input length is n and the circuit size is s. Perfect security comes from the fact that for any two functionally equivalent circuits C0C1, their truth tables are identical TTC0=TTC1 and hence impossible to distinguish.

To put simply, a truth table is a canonical form of all circuits producing it. While outputting the truth table satisfies the correctness and security requirement of i𝒪, the obvious flaw with this is that the obfuscator is far from efficient: The running time of an i𝒪 scheme should be sO(1), rather than 2n · s. The input length n of a Boolean circuit can be close to its size s, and hence the time to compute a truth table is exponentially large! This inefficiency is likely inherent, as efficient methods of finding canonical forms of circuits imply the collapse of the polynomial hierarchy, which is widely conjectured to be false.

Therefore, to improve efficiency, we seek canonical forms that fool computationally limited adversaries. Naturally, we start with a more humble goal:

Can we improve efficiency slightly to, say 2n(1−ɛ) · so(1) for some small ɛ > 0? What does i𝒪 with such non-trivial efficiency imply?

Simplification 1: Obfuscation with non-trivial exponential efficiency.  i𝒪 with non-trivial efficiency was studied in Lin et al.21 Surprisingly, they showed that a very modest improvement on efficiency – captured in the notion of exponential-efficiency i𝒪 or xi𝒪 for short – is actually enough to construct completely efficient (polynomial time) i𝒪b.

  • xi𝒪 is an obfuscator 𝒪 whose running time is still “trivial” (2n · s)O(1), but outputs an obfuscated circuit Ĉ of “non-trivial” size 2n(1−ɛ) · so(1) for some ɛ > 0.

We can think of xi𝒪 (as well as i𝒪) as a special kind of encryption method, where the obfuscated circuit Ĉ is a “ciphertext” of the original circuit C, also denoted as spCT(C)=Ĉ, satisfying that

  • the ciphertext spCT(C) hides all information about C, except that it lets anyone with access to it learn the truth table of C. This is unlike normal encryption which reveals no information about the encrypted message.

  • The size of the ciphertext spCT(C) is 2(1−ɛ)·n for some 0 < ɛ < 1 So it can be viewed as a (slightly) compressed version of the truth table (that reveals no other information of C to computationally limited adversaries).

The such special encryption scheme is known as functional encryption, which controls precisely which information of the encrypted message is revealed, and hides all other information. This notion is tightly connected with xi𝒪 and i𝒪, and, in fact, the implication of xi𝒪 to i𝒪 goes via the notion of functional encryption.

When viewing spCT(C) as a compressed version of the truth table. It becomes clear why even slight compression is powerful enough to imply i𝒪: The idea is keeping compressing iteratively until the size of the special ciphertext becomes polynomial. The works of Ananth and Jain,3 Bitansky and Vaikuntanathan,8 and Lin et al.21 turn this high-level idea into an actual proof that xi𝒪 implies i𝒪c and allows us to focus on constructing xi𝒪, or equivalently the special encryption described above.

Simplification 2: It suffices to obfuscate simple circuits.  Unfortunately, despite the efficiency relaxation, it is still unclear how to obfuscate general Boolean circuits, which can be complex. Naturally, we ask:

Can we obfuscate simple sub-classes of circuits? What does xi𝒪 for simple circuits imply?

It turns out that it suffices to focus on an extremely simple class of circuits C = NC0, where NC0 is the set of all circuits with constant output locality, meaning every output bit depends on a constant number of input bits. To do so, we will rely on two cryptographic tools, randomized encodings and Pseudo-Random Generators (PRGs).

Randomized encoding in NC0.  A Randomized Encoding (RE) scheme consists of two efficient algorithms (RE, Decode). It gives a way to represent a complex circuit C(·), by a much simpler randomized circuit REC(·; ·) := RE(C, ·; ·), such that,

  • Correctness: for every input x, the output πx of RE(C, x; r) produced using uniformly random coins r encodes the correct output: In other words, there exists an efficient decoding algorithm Decode such that Decode(π) = C(x).

  • Security: πx reveals no information of C beyond the output of x to efficient adversaries.

  • Simplicity: RE is a simple circuit by some measure of simplicity. Classic works6,25 showed that RE can be simply a NC0 circuit, when assuming the existence of PRGs in NC0 like we are.

The correctness and security of the randomized encoding suggest that, instead of directly obfuscating a general circuit C, we can alternatively obfuscate a circuit D that on input x outputs an encoding πx, which reveals only C(x). The potential benefit is that D depending on RE and C should be simply NC0 circuit. Hence, it would suffice to construct xi𝒪 for simple NC0 circuits!

To make the above idea goes through, there are, however, a few wrinkles to be ironed out. The issue is that the security of randomized encoding only holds if an encoding πx is generated using fresh random coins. There are concrete attacks that learn information of C beyond the outputs, when πx is generated using non-uniform random coins, or when two encoding πx and πx′ are generated using correlated random coins. If the truth table of the circuit D contains an encoding πx for every input x (i.e., TTD[x] = πx) the random coins for generating these encodings must be embedded in D, that is,

D = D C , R E , r 1 , r 2 , , r 2 n , s.t. D ( x ) = R E ( C , x ; r x )

Such a circuit D has size at least 2n. In particular, we cannot hope to “compress” the random coins r1, r2, …, r2 into 2n(1−ɛ) space, which is the target size of the obfuscated circuit. To resolve this problem, we will use a Pseudo-Random Generator.

Pseudo-random generator in NC0.  Recall that a pseudo-random generator is a Boolean function PRG: {0, 1}n → {0, 1}m that takes as input a uniformly random seed sdɛ{0,1}n, and produces a polynomially longer output rɛ{0,1}m, where m = n1+ρ for some constant ρ > 0, such that, r is indistinguishable to a uniformly random m-bit string to computationally limited adversaries. There are several proposals of pseudo-random generators that can be computed by an NC0 function.

Equipped with a pseudo-random generator in NC0, we can replace uniformly random coins r1,r2,,r2n with pseudorandom coins expanded from a much shorter seed sd of length roughly 2n/(1+ρ)=2n(1ϵ) for some ε′ɛ(0,1)d. This gives a variant of the circuit D above:

D = D C , R E , P R G , s d , s . t . D ( x ) = R E ( C , x ; ( r x = P R G x ( s d ) ) ) .

where PRG(sd) expands to output r1,,r2n and rx = PRGx(sd) is the x‘th chunk of the output. Thanks to the fact that both PRG and RE are in NC0, so is D′.

Moving forward, it suffices to devise a way to encrypt C, sd into a ciphertext spCT(C, sd), from which we can expand out the truth table of D′, while hiding all other information of C and sd.

spCT ( C , sd ) G RE,PRG TT D C , sd G RE,PRG TT D

Note that the special encryption only needs to hide (C, sd), instead of the entire description of circuit D′ since RE and PRG are public algorithms. Moreover, the truth table of D′ can be computed from (C, sd) via a function GRE, PRG in NC0.

2.3. Special encryption for NC0 mappings

In our works,18,19 we constructed the needed special encryption for general NC0 mappings G: {0, 1}l → {0, 1}m, where ciphertext spCT(X) reveals the output of G on X while hiding all other information of X, such that size(spCT(X)) ~ size(X) + m1−δ for some δ > 0. In this subsection, we describe the first half of the ideas behind our construction, which connects the special encryption with bilinear pairing groups and a new object called a structured-seed pseudo-random generator. The second half of the ideas are explained in Section 2.5, describing the construction of a structured-seed pseudorandom generator.

Connection with bilinear pairing groups.  As a starting point, suppose that the function G is really simple: Simple enough so that it can be computed by a degree-2 polynomial mapping Q:ZplZpm over the finite field ℤp for some prime modulus p. Namely,

X { 0 , 1 } l , G ( X ) = Q ( X ) , where, Q i ( X ) = ( Σ j , k α j , k X j X k + Σ k β k X k + γ ) m o d p .

Then, the special encryption can be implemented using bilinear pairing groups, as shown in Ananth and Sahai,4 and Lin.20

Bilinear pairing groups.  At a high-level, pairing groups allows for computing quadratic polynomials over secret encoded values, and reveal only whether the output is zero or not. More specifically, it consists of cyclic groups 𝔾1, 𝔾2, and 𝔾T with generators g1, g2, and gT, respectively, and all of order p. 𝔾1 and 𝔾2 are referred to as the source groups and 𝔾T the target group. (In some instantiations, the two source groups are the same group, which is called symmetric pairing groups.) They support the following operations:

  • Encode: For every group 𝔾i, one can compute gix for xɛℤp. The group element gix is viewed as an encoding of x in the group 𝔾i.

  • Group Operation: In each group 𝔾i, one can perform the group operation to get gix1gix2=gix1+x2, corresponding to “homomorphic” addition modulo p in the exponent. Following the tradition of cryptography, we write the group operation multiplicatively.

  • Bilinear Pairing: Given two source group elements gx1 and gx2, one can efficiently compute a target group element gTx1x2, using the so-called pairing operation e(g1x1,g2x2)=gTx1x2. This corresponds to “homomorphic” multiplication modulo p in the exponent. However, after multiplication, we obtain an element in the target group that cannot be paired anymore.

  • Zero Testing: Given a group element gix, one can always tell if the encoded value is x = 0, by comparing the group element with the identity in Gi. Similarly, one can do equality testing to see if gix=gic for any c.

Combining the above abilities gives a “rudimentary” special encryption scheme that supports the evaluation of degree-2 polynomials. A ciphertext of XZpl includes encodings of every element Xl in both source groups, ((g1x1,,g1xl),(g2x1,,g2xl)). Given these, one can “homomorphically” compute a quadratic mapping Q to obtain encoding of the output y = Q(X) in the target group (gTy1,,gTym) (without knowing the encoded input X at all); finally, if the output y is a bit string, one can learn y in the clear via zerotesting. In summary,

( g 1 X 1 , g 1 X l ) , ( g 2 X 1 , g 2 X l ) ) E x p a n d Q ( x ) , if Q ( x ) { 0 , 1 } m .

This fulfills the correctness of the special encryption. What about security? The ciphertext must hide all information about X beyond what’s revealed by Q(x), for which we resort to the security of pairing groups. For simplicity of this overview, let’s gain some security intuition by assuming the strongest hardness assumptions pertaining to the pairing groups, known as the generic group model. Think of encoding as black boxes and the only way of extracting information (of the encoded values) is by performing (a combination of) the above four operations. In this model, given gx for a secret and random xɛℤp no efficient adversary can learn x. Extending further, if X were random (in ℤp) subject to Q(X) = y, then the encoding would reveal only y and hide all other information of X. The only issue is that X in our example is not random – it is binary. Nevertheless, the works of Ananth and Sahai,4 and Lin20 designed clever ways of randomizing an arbitrary input X, to a random input subject to the only condition that an appropriate quadratic mapping reveals the right output ()=Q¯(X). Hence, security is attained. We refer the reader to Ananth and Sahai,4 and Lin20 for details about how such randomization works.

Challenges beyond Degree 2.  Unfortunately, the mapping GRE,PRG we care about can only be computed by polynomials of a degree much larger than 2. It is known that a Boolean function with output locality l can be computed by a multilinear degree l polynomial over ℤp. However, the locality of Boolean PRG we need is at least 5,22 and known randomized encodings have locality at least 4.6

Key idea: The preprocessing model.  To overcome this challenge, our first idea is to use preprocessing of inputs to help reduce the degree of computation. Instead of directly computing G(X) in one shot, we separate it into two steps: First, the input is preprocessed X′ = pre(X; r) in a randomized way using fresh random coins r, then the output is computed from the preprocessed input y = Q(X′). The idea is that the preprocessing can perform complex transformations on the input in order to help the computation later. The only constraint is that the preprocessing should not increase the size of its input too much, that is, size(X′) ~ size(X)+m1−δ, for some δ > 0. As such, it suffices to encrypt the preprocessed input spCT(X′), from which one can recover the desired output y = G(X), by evaluating a (hopefully) simpler function Q. Unfortunately, because of the restriction on the size of X′, it is unclear how preprocessing alone can help.

Our second idea is to further relax the preprocessing model to allow the preprocessed input to contain a public part and a secret part X′ = (P, S). Importantly, the public part P should reveal no information about X to computationally limited adversaries. (In contrast, P, S together reveals X completely.) Moreover, we allow the second stage computation Q(P, S) to have an arbitrary constant degree in P and only restricts its degree on S to 2, that is,

Q i ( P , S ) = Σ j , k α j , k ( P ) S j S k + Σ k β k ( P ) S k + γ ( P ) ,

where αj, k, βk, γ are constant degree polynomials over ℤp. It turns out that the techniques alluded to above for special encryption for degree 2 computations can be extended, so that given ciphertext spCT(S) and P in the clear, one can homomorphically compute Q and thereby learn the output G(X).

In Jain et al.18,19 we show how to compute any NC0 Boolean function G in such a preprocessing model, assuming the Learning Parity with Noises assumption over general fields, which completes the puzzle of i𝒪. When applied to specific cryptographic tools, our techniques give interesting new objects. For instance, it converts any PRG in NC0 into what we call structured-seed PRG. Given a preprocessed seed (P, S) = PRG(sd; r′), the structured-seed PRG expands out a polynomially longer output r = sPRG(P, S), where the computation has only degree-2 in the private seed S, and the output r is pseudo-random given the public seed P. In the next section, we describe how to do preprocessing in the context of constructing structured-seed PRG. The same ideas can be extended to handle general NC0 functions.

2.4. Definition of structured-seed PRG

Definition 2. (Syntax of Structured-Seed Pseudo-Random Generators (sPRG).)

Let τ > 1. A structured seed Boolean PRG (sPRG) with polynomial stretch τ, is defined by the following PPT algorithms:

  • IdSamp(1n, p): The index sampling algorithm takes as input the input length parameter n, and a prime p. It samples a function index I.

  • SdSamp(I) jointly samples two strings, a public seed and a private seed, sd = (P, S) which are both vectors of dimension ℓsd = O(n) over ℤp

  • Eval(I, sd) computes a string in {0, 1}m, m = nτ.

Looking ahead, the prime p, that we choose is set to the order of the bilinear group which has a bit length of nθ(1).

Definition 3. (Security of sPRG.)

A structured-seed Boolean PRG, sPRG, satisfies security if for any constant ρ > 0, any function p : ℕ → ℤ that takes as input a number k ɛ ℕ and outputs a kρ bit prime ρ(k), any nɛℕ, with probability 1 − o(1) over IdSamp(1n, p = p(n)) → I it holds that the following distributions are computationally indistinguishable.

{ I , P , Eval ( I , P , S ) } c { I , P , r }

where I is sampled by IdSamp(1n, p), sd by SdSamp(I), and r sampled randomly from {0, 1}m(n).

Definition 4. (Complexity and degree of sPRG.)

Let ρ > 0, d1, d2 ɛ ℕ be any constant, and p : ℕ → ℤ be any function that maps an integer k into a kρ bit prime p(k). An sPRG has degree d1 in public seed P and degree d2 in S over ℤp, denoted as, sPRGɛ(deg d1, deg d2), if for every I in the support of IdSamp(1n, p = p(n)), there exists efficiently generatable polynomials gI,1, …, gI,m over ℤp such that:

  • Eval(I, (P, S)) = (gI,1(P, S), …, gI,m(P, S)), and,

  • The maximum degree of each gI,j over P is d1, and the maximum degree of gI,j over S is d2.

We remark that the above definition generalizes the standard notion of families of PRGs in two aspects: 1) the seed consists of a public part and a private part, jointly sampled and arbitrarily correlated, and 2) the seed may not be uniform. Therefore, we obtain the standard notion as a special case.

Definition 5. (Pseudo-Random Generators, degree, and locality.)

A (uniform-seed) Boolean PRG (PRG) is an sPRG with a seed sampling algorithm SdSamp(I) that outputs a public seed P that is an empty string and a uniformly random private seed S ← {0, 1}n. Let d, c ɛ ℕ. The PRG has multilinear degree d if for every n ɛ ℕ and I in the support of IdSamp(1n), we have that Eval(I, sd) can be written as an m(n)-tuple of degree-d polynomials over ℤ in S. It has constant locality c if for every nɛℕ and I in the support of IdSamp(1n), every output bit of Eval(I, sd) depends on at most c bits of S.

In what follows next we will give an overview of our construction of sPRG from the LPN assumption and the existence of PRG in NC0. For ease of exposition, we will actually use Goldreich’s PRG candidate16 which is the most wellknown conjectured PRG in NC0. See Jain et al.18 for the construction based on any PRG in NC0.

Definition 6. (Goldreich’s PRG.)

A Goldreich PRG of locality c mapping n bits to m bits is described using a predicate f : {0, 1}c → {0, 1} and a hypergraph H = {Q1, …, Qm} where each Qi is a randomly chosen ordered subset of [n] of size c. The index I consists of f and H. Further, on input xɛ{0,1}n, PRG.Eval(I, x) = y, where y = (y1, …, ym). Here each yi=f(xi1,,xic) where Qi = (i1, …, ic).

2.5. SPRG overview

We start with a Goldreich PRG PRG = (IdSamp, Eval) with stretch τ, associated with a d-local predicate f : {0, 1}d → {0, 1} and a randomly sampled hypergraph H, i.e., I = (f, H). Observe that on any input σɛ{0,1}n, y = PRG.Eval(I, σ) can be computed by degree d multilinear polynomials over ℤ as f is a predicate evaluated on d bits. Our high-level strategy is to somehow “preprocess” σ into two vectors (P, S) of small dimension (preferably O(n), but anything sublinear in m works) such that yɛ{0,1}m can be computed in (deg d, deg 2). Thereby this will have the effect of transferring the complexity of computation to the public input. To achieve this, we pre-process σ into appropriate public and private seeds (P, S) and leverage the LPN assumption over ℤp (which is the input to sPRG.IdSamp) to show that the seed is hidden.

Our first idea toward this is that we can “encrypt” the seed σ using LPN samples over ℤp as follows:

Sample:   A Z p , s Z p 1 × , e D r 1 × n ( p ) Add to the function index I : A Add to public seed p : b = s A + e + σ

where ℓ is set to be the dimension for the LPN samples. It is set so that d2=n.Dr(p) is a distribution that samples randomly from p with probability r, and 0 otherwise. Finally, r = ℓδ.

It follows directly from LPN assumption that (A, b) is pseudorandom and hides σ. Furthermore, due to the sparsity of LPN noises, the vector σ + e differs from σ only at a ℓδ fraction of components – thus it is a sparsely erroneous version of the seed. For iɛ[m], let fi(σ) be the locality d, degree d polynomial over ℤ such that yi = fi(σ). Then, for iɛ[m] consider the following polynomial:

h i , ( b , s ¯ d 2 ) = f i ( b s A ) s ¯ = s | | 1.

Above we first interpret fi over ℤ into the field ℤp by simply reducing coefficients mod p, and then compute as given. Observe that fi is a degree d polynomial in b and s, therefore its degree over b is d and over s¯d2 is two. Thus hi is a (deg d, deg 2) polynomial. Observe that hi(b,s¯d2)=fi(σ+e). The main point of this is this is that if we set the polynomial map G(1)=(G1(1),,Gm(1)) by letting each Gi(1)=hi, and setting the private seed S=s¯d2:

G ( 1 ) ( P , S ) = PRG. Eval I ( σ + e )

The reason G(1) is interesting because e is sparse. With probability, 12nΩ(1), it is non-zero at O(nδ) locations. As a consequence of this, for any given iɛ[m], fi(σ) = fi(σ + e) with all but O(ℓδ) probability as fi is a d local function depending on d randomly chosen inputs. Since in the hypergraph H of the Goldreich PRG, each set Qi is chosen independently, every output is independently error-prone with probability O(ℓδ). Because of this due to Chernoff style concentration bounds, out of m outputs, with probability 12nΩ(1), all but T = O(mδ) outputs are error-prone.

This gives us a nice candidate for sPRG that satisfies almost all properties! The dimension of S and P is O(n) which is sublinear in m, and it can be computed by a degree (deg d, deg 2) polynomial G(1). We would be done if we could somehow force the output to be correct on all the m coordinates. For the rest of the overview, we refer to the indices iɛ[m], such that fi(σ) ≠ fi(σ + e), as bad indices/outputs.

To correct errors, we further modify the polynomial and include more pre-processed information in the private seeds. We describe a sequence of ideas that lead to the final correction method, starting with two wrong ideas that illustrate the difficulties we will overcome.

  • The first wrong idea is correcting by adding the difference Corr = yy between the correct and erroneous outputs, y = EvalI (σ) and y = EvalI (σ +e); we refer to Corr as the correction vector. To obtain the correct output, evaluation can compute the polynomial map G(1)(b,(s¯d2))+Corr. The problem is that Corr must be included in the seed, but it is as long as the output and would destroy expansion. Thus, we have to make use of the fact that Corr is sparse.

  • To fix expansion, the second wrong idea is adding correction only for bad outputs, so that the seed only stores non-zero entries in Corr. Recall that, Corr is sparse with at most T non-zero elements. More precisely, the j‘th output can be computed as Gj(1)(b,(s¯d2))+Corrj if output j is bad and without adding Corrj otherwise. This fixes expansion, but now the evaluation polynomial depends on the location of bad outputs. If these locations are included in public information, this would leak information about the location of LPN noises, and jeopardize security. If on the other hand the locations are included in the private seed, then it is unclear how to maintain the requirement that the polynomial map computes only a degree two polynomial in the private seed.

These two wrong ideas illustrate the tension between the expansion and security of our sPRG. Our construction takes care of both, by compressing the correction vector Corr to be polynomially shorter than the output and stored in the seed, and expanding it back during evaluation in a way that is oblivious of the location of bad output bits. This is possible thanks to the sparsity of the correction vector and the allowed degree two computation on the private seed. We first illustrate our idea with the help of a simple case.

Simple Case 1: Much fewer than m bad outputs.  Suppose hypothetically that the number of bad outputs is bounded by z which is much smaller than m. Thus, if we convert Corr into a m×m matrixe, it has low-rank z. We can then factorize Corr into two matrixes U and V of dimensions m×z and z×m, respectively, such that Corr = UV, and compute the correct output as follows: ∀jɛ[m],

G j ( 2 ) ( b , ( s ¯ d 2 , U , V ) ) = G j ( 1 ) ( b , ( s ¯ d 2 ) ) + ( U V ) k j , l j ,

where (kj, lj) is the corresponding index of the output bit j, in the m×m matrix. When zm the matrices U, V have 2zm field elements, which are polynomially smaller than m = nτ. As such, G(2) is expanding. Moreover, observe that G(2) has only degree 2 in the private seed and is completely oblivious to where the bad outputs are.

While the idea above works for fewer than m bad outputs, it does not work for the case we are in. We have T = Θ(mδ) bad outputs. Nevertheless, we show that a similar idea works for this case.

T bad outputs. The above method however cannot handle more than m bad outputs, whereas the actual number of bad outputs can be up to T = Ω(m/ℓδ), much larger than m since δ is an arbitrarily small constant. Consider another hypothetical case where the bad outputs are evenly spread in the following sense: suppose that if we divide the matrix Corr into m/ℓδ blocks, each of dimension ℓδ/2 × ℓδ/2, there are at most ℓρ bad outputs in each block where ρ > 0 is a really small constant (say δ/10). In this case, we can “compress” each block of Corr separately using the idea from case 1. More specifically, for every block iɛ[m/ℓδ] we factor it into UiVi, with dimensions ℓd/2 × ℓρ and ℓρ × ℓδ/2, respectively, and correct bad outputs as follows: ∀jɛ[m],

G j ( 2 ) ( b , ( s ¯ d 2 , ( U i , V i ) i [ m δ ] ) ) = G j ( 1 ) ( b , ( s ¯ d 2 ) ) + ( U i j V i j ) k j , l j ,

where ij is the block that output j belongs to, and (kj, lj) ɛ [ℓδ/2]×[ℓδ/2] is its index within this block. We observe that G(2) is expanding, since each matrix Ui or Vi has ℓδ/2+ρ field elements, and the total number of elements is δ/2+ρmδ which is polynomially smaller than m as long as δ is positive and m is polynomially related to ℓ. Moreover, G(2) is oblivious to the location of bad outputs just as in case 1.

This completely solves our problem except that we need to ensure that the bad outputs are well spread out in the manner described above. Our main observation here is that this is ensured due to the fact that in a Goldreich’s PRG candidate the input dependence hypergraph Q1, …, Qm is randomly chosen. Therefore, once we fix the location of the non-zero errors inside e (where with high probability O(nδ) locations are non-zero), in every block of ℓδ output bits, each entry j is independently non-zero with probability O(ℓδ). Thus, in expectation, each block has a constant number of bad output bits. More so, due to the Chernoff bound, it can be seen that with probability 12Ω(1), each has at most ℓρ, non-zero elements. Thus, our construction can be summarized as follows:

  • Step 1: Assign outputs. We partition the outputs into B buckets, via a mapping øbkt : [m] → [B]. The number of buckets is set to B = m/ℓδ and the number of elements in each bucket is set to be ℓδ so that they exactly form a partition of m. The mapping øbkt simply divides m by B, and outputs the remainder. Since the error e is chosen to be from the LPN error distribution and the hypergraph H of the PRG is randomly chosen, by a Chernoffstyle argument, we can show that in each bucket out of δ output bits, at most t of them are bad, except with probability 12tΩ(1). We will set t = ℓδ for a tiny constant ρ > 0.

  • Step 2: Compress the buckets. Next, we organize each bucket i into a matrix Mi of dimension ℓd/2 × ℓd/2 and then compute its factorization Mi = UiVi, where Ui, Vi are matrices of dimensions ℓδ/2 × t and t × ℓδ/2, respectively. To form matrix Mi, we use another mapping øind : [m] → [ℓδ/2] × [ℓδ/2] to assign each output bit j to an index (kj, lj) in the matrix of the bucket ij it is assigned to. This assignment must guarantee that no two output bits in the same bucket (assigned according to øbkt) have the same index. One such way to compute øind( j) is to divide jɛ[m] by B. The remainder is set as øbkt( j), and the quotient is divided further by ℓδ/2. The quotient and the remainder from this division are set as the resulting indices (kj, lj). Once we have this, (Mi)k,l is set to Corrj if there is j such that øbkt( j) = i and øind(j) = (k, l). Since every matrix Mi has at most t non-zero entries, we can factor them and compute the correct output as: ∀ j ɛ [m],

    G j ( 2 ) ( b , ( s ¯ d 2 , ( U i , V i ) i [ m δ ] ) S ) = G j ( 1 ) ( b , ( s ¯ d 2 ) ) + ( U φ bkt ( j ) V φ bkt ( j ) ) φ ind ( j ) ,

    G(2) is expanding, because the number of field elements in Ui‘s and Vi‘s are much smaller than m, namely: 2tℓδ/2B = O(mℓδ/2+ρ)≪m. We set I′ = (I, A, øbkt, øind).

  • Step 3: Zeroize if uneven buckets. Finally, to deal with the low probability event that some bucket contains more than t bad outputs, we introduce a new variable called flag. If this occurs, our sPRG sets P and S as all zero vectors. In this case, the evaluation always outputs 0. This gives us our candidate sPRG. For security, observe that the polynomial map G(2) is independent of the location of LPN noises. With probability 12nΩ(1), the evaluation results in output y. Therefore, by the LPN over ℤp assumption, the seed σ of PRG is hidden and the security of PRG ensures that the output is pseudorandom when it is not all zero (which occurs with a subexponentially small probability). We now proceed to the formal construction and proof.

This completes the overview of our construction of structured seed PRG, as well as i𝒪; see Jain et al.18,19 for the formal constructions and security proofs. In conclusion, i𝒪 is an extremely powerful cryptographic tool and our works have established its feasibility based on well-studied assumptions.

    References

    • 1. WhibOxContest21—CHES 2021 Challenge. (2021); https://contest2021.whibox.io/.
    • 2. Alekhnovich, M.  More on average case vs approximation complexity. In 44th FOCS. IEEE Computer Society Press, NY (Oct. 2003), 298307.
    • 3. Ananth, P., and Jain, A.  Indistinguishability obfuscation from compact functional encryption. In CRYPTO 2015, Part I . R.Gennaro, and M. J. B.Robshaw (eds). Volume 9215 of LNCS. Springer, Heidelberg (Aug. 2015), 308326.
    • 4. Ananth, P., and Sahai, A.  Projective arithmetic functional encryption and indistinguishability obfuscation from degree-5 multilinear maps. In EUROCRYPT 2017, Part I . J.-S.Coron, and J. B.Nielsen (eds). Volume 10210 of LNCS. Springer, Heidelberg (Apr./May 2017), 152181.
    • 5. Applebaum, B. et al.  Arithmetic cryptography: Extended abstract. In ITCS. 2015. T.Roughgarden (ed). ACM, NY (Jan. 2015), 143151.
    • 6. Applebaum, B., et al.  Cryptography in NC. In FOCS . IEEE Computer Society, NY (2004), 166175.
    • 7. Barak, B. et al.  On the (im)possibility of obfuscating programs. In CRYPTO. 2001. J.Kilian (ed). Volume 2139 of LNCS. Springer, Heidelberg (Aug. 2001), 118.
    • 8. Bitansky, N., and Vaikuntanathan, V.  Indistinguishability obfuscation from functional encryption. In 56th FOCS. V.Guruswami (ed). IEEE Computer Society Press, NY (Oct. 2015), 171190.
    • 9. Blum, A. et al.  Cryptographic primitives based on hard learning problems. In CRYPTO'93. D. R.Stinson (ed). Volume 773 of LNCS. Springer, Heidelberg (Aug. 1994), 278291.
    • 10. Boneh, D. et al.  Short group signatures. In CRYPTO. 2004. M.Franklin (ed). Volume 3152 of LNCS. Springer, Heidelberg (Aug. 2004), 4155.
    • 11. Boyle, E. et al.  Compressing vector OLE. In ACM CCS . 2018. D.Lie et al.  eds. ACM Press, NY (Oct. 2018), 896912.
    • 12. Boyle, E. et al.  Correlated pseudorandom functions from variable-density LPN. In 61st FOCS. IEEE Computer Society Press, NY (Nov. 2020), 10691080.
    • 13. Diffie, W., and Hellman, M.E.  New directions in cryptography. IEEE Trans. Inform. Theory . 22, 6 (1976), 644654.
    • 14. Garg, S. et al.  Candidate indistinguishability obfuscation and functional encryption for all circuits. In 54th FOCS. IEEE Computer Society Press, NY (Oct. 2013), 4049.
    • 15. Gilbert, E.N.  A comparison of signalling alphabets. Bell Syst. Tech. J. 31, 3 (1952), 504522.
    • 16. Goldreich, O.  Candidate one-way functions based on expander graphs. Electron. Colloq. Comput. Complexity (ECCC). 7, 90 (2000).
    • 17. Ishai, Y. et al.  Founding cryptography on oblivious transfer—efficiently. In CRYPTO. 2008. D.Wagner (ed). Volume 5157 of LNCS. Springer, Heidelberg (Aug. 2008), 572591.
    • 18. Jain, A. et al.  Indistinguishability obfuscation from well-founded assumptions. In STOC'21: 53rd Annual ACM SIGACT Symposium on Theory of Computing, Virtual Event, Italy, June 21–25 , 2021. S.Khuller, and V. V.Williams (eds). ACM, NY (2021), 6073.
    • 19. Jain, A. et al.  Indistinguishability obfuscation from LPN over F, DLIN, and PRGs in NC. In EUROCRYPT 2022, Part I . O.Dunkelman, and S.Dziembowski (eds). Volume 13275 of LNCS. Springer, Heidelberg (May/June 2022), 670699.
    • 20. Lin, H.  Indistinguishability obfuscation from SXDH on 5-linear maps and locality-5 PRGs. In CRYPTO 2017, Part I . J.Katz, and H.Shacham (eds). Volume 10401 of LNCS. Springer, Heidelberg (Aug. 2017), 599629.
    • 21. Lin, H. et al.  Indistinguishability obfuscation with non-trivial efficiency. In PKC 2016, Part II. C.-M.Cheng et al.  eds. Volume 9615 of LNCS. Springer, Heidelberg (Mar. 2016), 447462.
    • 22. Mossel, E. et al.  On e-biased generators in NC0. In 44th FOCS . IEEE Computer Society Press, NY (Oct. 2003), 136145.
    • 23. Regev, O.  On lattices, learning with errors, random linear codes, and cryptography. In STOC. (2005), 8493.
    • 24. Sahai, A., and Waters, B.  How to use indistinguishability obfuscation: Deniable encryption, and more. In 46th ACM STOC. D. B.Shmoys (ed). ACM Press, NY (May/June 2014), 475484.
    • 25. Yao, A.C.-C.  How to generate and exchange secrets (extended abstract). In FOCS. (1986), 162167.
    • For technical reasons, we need to hardness of these assumptions to be such that no polynomial-time adversaries have beyond sub-exponentially small advantage in breaking the hardness of the underlying problems.
    • This is the step where the additional assumption of Learning With Errors (LWE) is used. However, as we showed in Jain et al.19 this assumption is unnecessary for i𝒪.
    • This implication, however, comes with a quantitative weakening in the security.
    • More precisely, the length of sd is (2nsO(1)1/(1+ρ)), since each rx is a sO(1)-bit string instead of a single bit. However, the dominant term is 2n/(1+ ρ) as s is only polynomial in n.
    • Any injective mapping from a vector to a matrix that is efficient to compute and invert will do.

Join the Discussion (0)

Become a Member or Sign In to Post a Comment

The Latest from CACM

Shape the Future of Computing

ACM encourages its members to take a direct hand in shaping the future of the association. There are more ways than ever to get involved.

Get Involved

Communications of the ACM (CACM) is now a fully Open Access publication.

By opening CACM to the world, we hope to increase engagement among the broader computer science community and encourage non-members to discover the rich resources ACM has to offer.

Learn More