The question of determining which (weak) forms of randomness “fool” (or seem totally random to) a given algorithm is a basic and fundamental question in the modern theory of computer science. In this work we report progress on this question by showing that any “*k*-wise independent” collection of random bits, for some *k* = (log *n*)^{O(1)}, fool algorithms computable by bounded depth circuits. In the process we also present known tight connections between bounded-depth circuits and low-degree multivariate polynomials. We establish a new such connection to prove our main result.

In the rest of this section, we introduce the basic concepts in greater detail so as to present a precise version of our main result.

A boolean circuit *C* is a circuit that is comprised of boolean inputs, boolean outputs, and gates that perform operations on the intermediate values within the circuit. The circuit is not allowed to have loops. In other words, the gates of the circuit form a directed acyclic graph. A circuit *C* with *n* input bits and one output naturally gives rise to a boolean function *F*_{C}: {0, 1}^{n} → {0, 1}. Depending on the type of the circuit, various restrictions may be placed on the size of the circuit, its shape, and the types of gates that it may use. In this paper, we will focus on circuits where unbounded fan-in AND and OR gates, as well as NOT gates are allowed.

Since any given circuit accepts a fixed number *n* of inputs, it can only compute a function over the set {0, 1}^{n} of strings of length *n.* If we want to discuss computing a function *F:* {0, 1}* → {0, 1} that takes strings of arbitrary length using circuits, we need to consider *families* of circuits parameterized by input size. A family of circuits
computes the function *F* if each circuit *C*_{n} computes the restriction of *F* to strings of length *n.*

The two most important measures of a circuit’s “power” are *size* and *depth.* Circuit size is the total number of gates used in constructing the circuit. For circuits with AND, OR, and NOT gates the depth is defined as the largest number of AND and OR gates a signal needs to traverse from any input to the output. The same measures can be applied to families of circuits. A family {*C*_{n}} is of polynomial size if the size of each *C*_{n} is bounded by *n*^{c} for some constant *c. Circuit complexity* studies the amount of resources (such as size, depth) required to compute various boolean functions.

In this context, studying circuits of bounded depth—i.e., using at most a constant *d* number of layers—is of particular interest. The complexity class capturing functions computed by boolean circuits of bounded depth and polynomial size is denoted by **AC**^{0}. Thus **AC**^{0} captures what can be computed by polynomial size circuits in constant time. The class **AC**^{0} has been studied extensively in the past three decades.

There are several reasons why **AC**^{0} circuits have been studied so extensively. Firstly, **AC**^{0} is essentially the only class of circuits for which strong unconditional lower-bounds have been proved: it is known, for example, that any **AC**^{0} circuit computing the parity function *PARITY* (*x*_{1},…,*x*_{n}) = Σ*x*_{i} mod 2 must be of size exponential in *n.*^{5,6} Secondly, there is a very tight connection between computations performed by the polynomial hierarchy (PH) complexity class relative to an oracle and computations by **AC**^{0} circuits. Thus better understanding of **AC**^{0} translates into a better understanding of the polynomial hierarchy. Finally, the class of **AC**^{0} functions, and its subclass of *DNF* formulas has been the target of numerous efficient machine learning algorithms. It is actually *because* **AC**^{0} is a relatively weak class that functions in this class are amenable to efficient learning: it can be shown that learning functions from larger circuit complexity classes would allow one to break cryptographic primitives such as integer factoring.

**1.2 Pseudorandomness: fooling bounded depth circuits**

Randomness is one of the most important computational resources. A randomized algorithm *A* may make use of a string of random bits *r* ε {0, 1}^{n} as part of the computation. The randomness may be used, for example, for Monte Carlo simulations, or for generating random hashes. Denote by
the uniform distribution on boolean strings of length *n.* Then our randomized algorithm *A* is executed with *r* being sampled from *U.* In reality, truly random samples are virtually impossible to obtain, and we resort to *pseudorandom* distributions and functions that generate pseudorandom bits, such as the `rand`

function in C. What makes a distribution μ over *n*-bit strings pseudorandom? Turns out that the answer to this question depends on the algorithm *A.* The distribution μ is pseudorandom for *A*, if the behavior of *A* on samples from μ is indistinguishable from its behavior on truly uniformly random *r.* In particular, if *A* was likely to work correctly with truly uniform samples, it will be likely to work correctly with samples drawn from μ.

For simplicity, suppose that *A* outputs a single bit. For a distribution μ on length-*n* strings {0, 1}^{n}, we denote by 0 ≤ **E**_{μ}[*A*] ≤ 1 the expected value of *F* on inputs drawn according to μ. When the distribution under consideration is the uniform distribution *U* on {0, 1}^{n}, we suppress the subscript and let **E**[*A*] denote the expected value of *A.* A distribution μ is ε-pseudorandom, or ε-*fools A* if the expected output when *A* is fed samples from μ is ε-close to the expected output when it is fed truly uniform samples:

Thus, when we use the `rand`

function in our code, we implicitly hope that its output ε-fools our program.

Similarly to fooling one algorithm, we can define fooling a class of functions. In particular, a distribution μ ε-fools **AC**^{0} functions if it ε-fools every function in the class.

The smaller (and weaker) the class of functions is, the easier it is to fool. Since **AC**^{0} is a relatively weak class of functions, there is hope of fooling them using a distribution μ or a relatively low entropy. In the late 1980s Nisan^{10} demonstrated a family of such distributions. In the present paper we will give a very general class of distributions that fool **AC**^{0} circuits, showing that all *k*-wise independent distributions with *k* = log^{O(1)} *n* fool **AC**^{0} circuits.

**1.3 k-wise independence and the main result**

A distribution μ on {0,1}^{n} is *k*-independent if every restriction of μ to *k* coordinates is uniform on {0, 1}^{k}. Clearly, the uniform distribution on {0, 1}^{n} is *n*-independent. A simple example of a distribution that is (*n* – 1)-independent but not uniform is

where the bits *x*_{1},…,*x*_{n−1} are selected uniformly at random and
is the XOR operator. Equivalently, μ_{
} selects a uniformly random string in {0, 1}^{n} subject to the condition that the number of 1’s selected is even.

*k*-Wise independent distributions can be sometimes used as pseudorandom generators. If μ is a *k*-wise independent distribution, it may require fewer than *n* bits of true randomness to sample from. For example, the distribution μ_{
} only requires *n* – 1 truly random bits to sample from. Alon et al.,^{1} building on ideas from Joffe^{7} give a simple construction of a *k*-wise independent distribution which only requires *O*(*k* log *n*) truly random bits using univariate polynomials.

Another way of looking at and constructing *k*-wise independent distributions is using coding theory. Simple linear codes give rise to a large family of *k*-wise independent distributions. For any binary code *C* with distance > *k*, a uniform distribution on the set

gives rise to a *k*-wise independent distribution. The example μ_{
} above corresponds to the *n*-times repetition code *C* = {00 … 0, 11 … 1} whose distance is *n*, again showing that μ_{
} is (*n* – 1)-wise independent.

For which *k*‘s does *k*-wise independence fool **AC**^{0} circuits? It is not hard to see that the problem of distinguishing μ_{
} from the uniform distribution is as hard as computing *PARITY*, and thus it is not surprising that μ_{
} fools **AC**^{0} circuits. If a distribution μ is *k*-wise independent, it means that “locally” the distribution looks uniform, and a circuit distinguishing μ from the uniform distribution needs to be able to “process” more than *k* input bits together. Thus, intuitively, *k*-wise independence should be able to fool **AC**^{0} circuits even for fairly small values of *k.* In 1990, Linial and Nisan^{9} conjectured that *k* = (log *m*)^{d – 1} (i.e., poly-logarithmic in *m*) is sufficient to fool **AC**^{0} circuits of size *m* and depth *d.* The main object of this paper is to give a proof of the following theorem, that was first published in 2009^{3}:

THEOREM 1. *r*(*m, d, ε)-independence ε-fools depth-d AC^{0} circuits of size m, where*

Theorem 1 shows that poly-logarithmic independence suffices to fool all **AC**^{0} circuits, thus settling the LinialNisan conjecture. A gap remains in the exact dependence of the exponent of the log on *d.* Prior to Theorem 1, the conjecture had been proved by Bazzi^{2} in 2007 for the special case of *DNF* formulas. Bazzi’s original proof was quite involved, and was later greatly simplified by Razborov.^{12} No nontrivial bound on *r(m, d*, ε) was known for *d* > 2.

### 2. Proof Overview

The main technical ingredient in our work is presenting a new connection between low-degree polynomials and **AC**^{0} circuits. Connections between polynomials and circuits, especially **AC**^{0} circuits, have been explored in the past in various contexts, and remain perhaps the most powerful tool for analyzing these functions.

We begin by observing that *k*-wise independent distributions perfectly fool degree-*k* polynomials. Let μ be any *k*-wise independent distribution over {0, 1}^{n}. As a first step, let
be a degree-*k* monomial. *m* may depend on at most *k* different variables, on which μ (being *k*-wise independent) appears to be perfectly uniformly random. Thus

Now, if *f* is a degree-*k* polynomial over *x*_{1},…, *x*_{n}, it can be written as a sum of degree-*k* monomials: *f* = Σ^{N}_{j=1}*m*_{j}. Each monomial is completely fooled by μ and thus

In light of (1), a reasonable attack strategy to prove Theorem 1 would be as follows. For a function *F* computed by an **AC**^{0} circuit, find a polynomial *f* that approximates *F* such that:

**E**[*F*] ≈**E**[*f*], i.e.,*f*approximates*F*well on average;**E**_{μ}[*F*] ≈**E**_{μ}[*f*], i.e.,*f*approximates*F*well on average, even when restrict ourselves to inputs drawn according to μ.

Then use (1) to conclude that

i.e., that μ ε-fools *F.* Of course, it is not clear at all that such a polynomial must exist (and we do not know whether it does). However, a carefully crafted attack along these lines actually does go through. The proof combines two previous constructions of approximating polynomials for **AC**^{0} circuits.

The first construction is due to Linial, Mansour, and Nisan^{8} and is discussed extensively in Section 3. It is an analytic construction that for each **AC**^{0} function *F* provides a low-degree polynomial
that approximates it well on average. Such a polynomial would easily fulfill requirement 1 above. However, it would not be helpful at fulfilling requirement 2. Since the support of μ is quite small, being close on average does not rule out the possibility that *F* and
deviate wildly from each other on the points of the support of the distribution μ. This type of uniform average-case approximation is inherently useless when we want *F* to be close to
on samples drawn according to μ.

The second construction is a combinatorial one, and is due to Razborov^{11} and Smolensky.^{13} It is discussed in Section 4. For an **AC**^{0} function *F*, and *any* distribution of inputs *v*, the RazborovSmolensky construction gives a low-degree polynomial *f* such that *F = f* with high probability with respect to *v.* In other words,

Note that here *f* is actually *equal* to *F* most of the time (and is not just close to it). Since *v* is allowed to be any distribution, we can take

to be the hybrid between μ and the uniform distribution, thus guaranteeing that the probability that *f(x) ≠ F(x)* is small simultaneously with respect to both distributions. This seems to be a promising improvement over the LinialMansourNisan polynomials, that may allow us to fulfill both requirements 1 and 2 above.

The caveat here is that on the few locations where *f(x) ≠ F(x)* the deviation |*f(x) – F(x)*| may be huge. In particular, while *f* is close to *F* (more precisely, it is perfectly equal) almost everywhere, it is not necessarily the case that they are close on average. Thus, for example, **E**[*f*] may differ wildly from **E**[*F*].

We see that both polynomial approximations get us “part-way” toward the goals, but do not quite work. To make the proof work we need to combine the two approaches as described in Section 5. In a nutshell, we first take a RazborovSmolensky polynomial *f* that agrees with *F* almost everywhere. It turns out that an “error” function ε(*x*) that “flags” the problematic locations, i.e., ε(*x*) = 1 whenever *f(x) ≠ F(x)*, can be computed by (a slightly bigger) **AC**^{0} circuit. Thus, it the function *f*(x)* = (1 – ε(*x*)) · *f(x)* were a polynomial, we would be in great shape: *f*(x) = F(x)* most of the time, while even when they disagree, the difference |*f*(x) – F(x)*| ≤ 1, and thus in fact **E**[*f**] ≈ **E**[*F*] and E_{μ}[*f**] ≈ *E*_{μ}[*F*].

Of course, *f** is not a polynomial, since ε is not a polynomial. However, it turns out with a little bit of work that the polynomial *f’* = (1 –
) · *f*, where
is the LinialMansourNisan approximation of ε actually allows us to prove the main theorem. Thus the proof comes out of combining the two approximation techniques in one polynomial.

### 3. Low-Depth Circuits and Polynomials—The Analytic Connection

Approximately a decade after the first **AC**^{0} lower bounds were published, Linial, Mansour, and Nisan proved the following remarkable lemma:

LEMMA 2. *If F:* {0, 1}^{n} → {0, 1} *is a boolean function computable by a depth-d circuit of size m, then for every t there is a degree t polynomial
with*^{8}

The lemma states that any low-depth circuit of size *m* can be approximated very well by a polynomial of degree (log *m*)^{O(d)}—a degree that is poly-logarithmic in the size of the circuit. The approximation error here is the average-square error:
doesn’t have to agree with *F* on any point, but the average value of |*F(x) – f(x)*|^{2} when taken over the Hamming cube is small. An illustration of the statement of the lemma can be found in Figure 1.

To understand the context of Lemma 2 we need to consider one of the most fundamental tools in the analysis of boolean functions: the Fourier transform, which we will briefly introduce here. An introductory survey on the Fourier transform over the boolean cube can be found in de Wolf.^{4} For the remainder of this section we will represent the boolean function *F:* {0, 1}^{n} → {0,1} using a function *G:* {-1, +1}^{n} → {-1, +1} with 0 corresponding to −1 and 1 to +1. Thus

This will not affect any of the results, but will make all the calculations much cleaner. Note that a degree-*t* polynomial approximation of *F* corresponds to a degree-*t* polynomial approximation for *G* and vice-versa.

Each function *G:* {-1, +1}^{n} → {-1, +1} can be viewed as a 2^{n}-dimensional vector in
specified by its truth table. Consider the special *parity functions* χ_{s}: {-1, +1}^{n} → {-1, +1}. For each set *S* ⊂ {1,…, *n*} of coordinates, let

In other words, χ_{s} is the parity of the *x*‘s that correspond to coordinates in the set *S.* There is a total of 2^{n} different functions χ_{s}—one corresponding to each subset *S* of coordinates. The function χ_{φ} is the constant function χφ(*x*) = 1, while the function χ_{{1,…, n}}^{(x)} outputs the parity of all the coordinates in *x.*

Similarly to the function *G*, each function χ_{s} can also be viewed as a vector in
specified by the values it takes on all possible inputs. By abuse of notation we will denote these vectors by χ_{s} as well. Moreover, for each two sets *S*_{1} ≠ *S*_{2} the vectors
and
are orthogonal to each other. That is,

Thus we have 2^{n} orthogonal vectors {χ_{s}} in
, and when properly normalize they yield an orthonormal basis of the space. In other words, each vector in
can be uniquely represented as a linear combination of the functions χ_{s}. In particular, the function *G* can be uniquely written as:

where the numerical coefficients are given by the inner product

The representation of *G* given by (2) is called the *Fourier transform* of *G.* The transform converts 2^{n} numbers (the truth table of *G*) into 2^{n} numbers (the coefficients
(*S*)), thus preserving all the information about *G.* Another way to view the representation (2) is as a canonical way of representing *G* as a multilinear polynomial with real-valued coefficients:

Thus each function *G* can be uniquely represented as a degree-*n* multilinear polynomial (a simple fact that can be proved directly, without using the Fourier transform). More importantly, when we view the functions χ_{s} as vectors, the space *H*_{t} of polynomials of degree ≤ *t*, for any *t*, is spanned by the functions {χ_{s}: |*S*| ≤ *t*}. This means that to get the best low-degree approximation for *G* we should simply project it onto *H*_{t}, to obtain

Note that
is no longer a boolean function, and may take arbitrary real values, even on inputs from {-1, +1}^{n}. The *l*_{2}-error—which we need to bound to prove Lemma 2—is given by

To bound (5) one needs to show that all but a very small fraction of weight in the Fourier representation of *G* is concentrated on low-degree coefficients. This is where the magic of Linial et al.^{8} happens, and this is where the fact that *G* is computed by an **AC**^{0} circuit is being used.

At a very high level, the proofs uses *random restrictions.* A random restriction ρ assigns the majority of *G*‘s inputs a random value 0 or 1, while leaving some inputs unset. A tool called the *Switching Lemma* is then used to show that when a random restriction ρ is applied to and **AC**^{0} function *G*, the resulting boolean function *G*|_{ρ} is highly likely to be very “simple”—depending on only few of the inputs. On the other hand, a random restriction applied to χ_{s} where |*S*| is large, is likely to leave some of the inputs in *S* unset. Thus if *G* had a lot of weight on the coefficients
(*S*) with large |*S*|, *G*|_{ρ} would be likely to remain “complicated” even after the application of ρ.

Lemma 2 immediately implies a lower bound for an **AC**^{0} circuit computing *PARITY*, since the Fourier representation of *PARITY* is χ_{s}(*PARITY*) = 1 for *S* = [*n*], and 0 otherwise. Thus *PARITY* cannot be approximated well by a polynomial even of degree *t = n* – 1.

One interesting application of Lemma 2 that is outside of the scope of the present article is for learning **AC**^{0} functions. Suppose that *G* is an unknown **AC**^{0} function of size *m*, and we are given access to examples of the form (*x, G(x))*, where *x* is chosen uniformly from {-1, +1}^{n}. Then we know that *G* has a good polynomial approximation of the form (4). Thus to (approximately) learn *G*, all we have to do is learn the coefficients
(*S*), where *S* is small. This can be done in time proportional to 2^{|S|}, immediately yielding an
algorithm for learning **AC**^{0} circuits.

### 4. Low-Depth Circuits and Polynomials—The Algebraic Connection

It turns out that average-error approximations are not the only useful way in which **AC**^{0} circuits can be approximated by low-degree polynomials. We have seen in the previous section that the representation of any boolean function by a multilinear polynomial is unique, and thus given an **AC**^{0} function *F* we cannot hope to obtain a low-degree polynomial *f* that agrees with *F* everywhere. We will see, however, that it is possible to construct a low-degree polynomial *f* that agrees with *F almost everywhere.* Specifically, the following statement holds:

LEMMA 3. *Let v be any probability distribution on* {0, 1}^{n}. *For a circuit of depth d and size m computing a function F, for any s, there is a degree r = (s* · log *m) ^{d} polynomial f such that*

**P**

_{v}[

*f(x) < F(x)*] < 0.82

^{s}

*m*.

Note that Lemma 3 promises an approximating polynomial, again of degree (log *m*)^{O(d)}, that has a low error against an *arbitrary distribution v* on inputs. The statement of the lemma is illustrated on Figure 2(a), where the probability according to *v* of the region of disagreement between *f* and *F* is small.

Lemma 3 (or its slight variation) has been proved by Razborov^{11} and Smolensky^{13} in the late 1980s. The tools for the proof of the lemma have been developed in Valiant and Vazirani.^{14} Razborov and Smolensky used the lemma to give stronger lower bounds on bounded depth circuits. Let **AC**^{0}[*p*] ⊃ **AC**^{0} be the class of functions computable using a bounded-depth circuit that in addition to the AND and OR gates is allowed to use the *MOD*_{p} gate: the gate outputs 1 if and only if the number of 1’s among its inputs is divisible by *p.* We already know that *PARITY* ∉ **AC**^{0}, and thus **AC**^{0}[2]
**AC**^{0}. Razborov and Smolensky showed that the *MOD*_{p} function cannot be computed efficiently by an **AC**^{0}[*q*] circuit where *q* ≠ *p*. In particular, this means that *PARITY* ∉ **AC**^{0}[3].

The proof of Lemma 3 is combinatorial, and is much simpler than the proof of Lemma 2. In fact, we will be able to present its entire proof below. To obtain the results in Section 5 we will need a slight strengthening of the lemma.

LEMMA 4. *Let v be any probability distribution on* {0, 1}^{n}. *For a circuit of depth d and size m computing a function F, for any s, there is a degree r* = (*s* · log *m*)^{d} *polynomial f and a boolean function* ε_{v} computable by a circuit of depth ≤ *d* + 3 *and size O(m ^{2}r) such that*

**P**_{v}[ε_{v}(*x*) = 1] < 0.82^{s}*m, and**whenever*ε_{v}(*x*) = 0,*f*(*x*) =*F*(*x*).

Thus, not only does the polynomial from Lemma 3 exist, but there is a simple **AC**^{0} “error circuit” ε_{v} that given an input can tell us whether *f* will make an error on the input. The function ε_{v} is shown on Figure 2(b).

We will now prove Lemma 4. A curious property of the construction is that it gives a probabilistic algorithm for producing the low degree polynomial *f* that approximates *F* almost everywhere.

PROOF. (of Lemma 4) We construct the polynomial *f* by induction on the depth *d* of *F*, and show that with high probability *f = F.* The function ε_{v} follows from the construction. Note that we do not know anything about the measure *v* and thus cannot give an explicit construction for *f.* Instead, we will construct a distribution on polynomials *f* that succeeds with high probability on any given input. Thus the distribution is expected to have a low error with respect to *v*, which implies that there is a specific *f* that has a small error with respect to *v.*

We will show how to make a step when the output gate in *f* is an *AND* gate (see Figure 3). Since the whole construction is symmetric with respect to 0 and 1, the step also holds for an *OR* gate. Let

where *k < m.* For convenience, let us assume that *k* = 2^{l} is a power of 2. We take a collection of

random subsets of {1, 2,…, *k*} where each element is included with probability *p* independently of the others: at least *s* subsets for each of the *p* = 2^{−1}, 2^{−2},…, 2^{-c} = 1/*k.* Denote the sets by *S*_{1},…, *S*_{t}—we ignore empty sets. In addition, we make sure to include {1,…, *k*} as one of the sets. Let *g*_{1},…, *g*_{k} be the approximating polynomials for *G*_{1},…, *G*_{k} that are guaranteed by the induction hypothesis applied to *G*_{1},…, *G*_{k} with depth *d* – 1. We set

By the induction assumption, the degrees of *g*_{j} are *deg*_{g} ≤ (*s* · log *m*)^{d−1}, hence the degree of *f* is bounded by *deg _{f} ≤ t · deg*

_{g}≤ (

*s*· log

*m*)

^{d}. Next we bound the error

**P**[

*f ≠ F*]. It consists of two terms:

In other words, to make a mistake, either one of the inputs to the final *AND* gate has to be wrong, or the approximating function for the AND has to make a mistake. We will focus on the second term. The first term is bounded by union bound. We fix a vector of specific values *G*_{1}(*x*),…, *G*_{k}(*x*), and calculate the probability of an error over the possible choices of the random sets *S*_{i}.

If all the *G*_{j}(*x*)’s are 1 then the value of *F(x)* = 1 is calculated correctly with probability 1.

Suppose that *F(x)* = 0 (and thus at least one of the *G _{j}(x)*‘s is 0). Let 1 ≤

*z*≤

*k*be the number of zeros among

*G*

_{1}(

*x*),…,

*G*), and α be such that 2

_{k}(x^{α}≤

*z*< 2

^{α+1}. Our formula will work correctly if one of the sets

*S*

_{i}hits

*exactly*one 0 among the

*z*zeros of

*G*

_{1}(

*x*),…,

*G*

_{k}(

*x*). We will consider only the sets

*S*

_{i}above that are likely to hit exactly one zero. Specifically, let

*S*be a random set as above with

*p*= 2

^{-α-1}. The probability of

*S*hitting exactly one zero is exactly

Hence the probability of the formula being wrong after *s* such sets is bounded by 0.82^{s}. Since this is true for any value of *x*, we can find a collection of sets *S*_{i} such that the probability of error as measured by *v* is at most 0.82^{s}. By making the same probabilistic argument at every node and applying the union bound, we get that the condition “if the inputs are correct then the output is correct” is satisfied by all nodes except with probability < 0.82^{s}*m*. Thus the error of the polynomial is < 0.82^{s}*m*.

Finally, if we know the sets *S*_{i} at every node, it is easy to check whether there is a mistake by checking that no set contains exactly one 0, thus yielding the depth ≤ (*d* + 3) function ε_{v}. The blowup in size is at most *O*(*mr*) since at each node we take a disjunction over all the possible pairs of (*S*_{i}, *G*_{j}
*S*_{i}) of whether *G*_{j} is the only 0 in the set *S*_{i}.

### 5. *K*-Wise Independent Distributions and Low-Depth Circuits

Next we turn our attention to proving the main result, Theorem 1. The result will give another way in which **AC**^{0} circuits can be approximated using low-degree polynomials.

THEOREM 1. *r(m, d, ε)-independence ε-fools depth-d AC^{0} circuits of size m, where*

To prove Theorem 1 we will show:

LEMMA 5. *Let s* ≥ log *m be any parameter. Let F be a boolean function computed by a circuit of depth d and size m. Let μ be an r-independent distribution where*

*then*

*where* ε(*s, d*) = 0.82^{s} · (10*m*).

Theorem 1 follows from Lemma 5 by taking .

As mentioned in the overview, one can prove that *k*-wise independence fools a function *F* by constructing an appropriate approximation of *F* with low degree polynomials. Bazzi^{2} has given the following equivalent characterization of fooling through polynomial approximations using linear programming duality:

LEMMA 6. *Let F:* {0, 1}^{n} → {0, 1} *be a boolean function, k* ≥ 0 *an integer, and* ε > 0. *Then the following are equivalent:*^{2}

*Any k-wise independent distribution ε-fools F.**there exist a pair of “sandwiching polynomials” f*_{l},*f*_{u}: {0, 1}^{n}→ R*such that:*

**low degree**:*deg*(*f*_{l}),*deg*(*f*_{u}) ≤*k*;**sandwiching**:*f*{0, 1}_{l}≤ F ≤ f_{u}on^{n};**small error**:**E**[*F*–*f*_{l}],**E**[*f*_{u}–*F*] ≤ ε,*where the expectation is with respect to the uniform distribution on*{0, 1}^{n}.

The sandwiching polynomials are illustrated on Figure 4(a). Since part (1) in Lemma 6 is what we need to prove, we will only be interested in the “(2)
(1)” direction of the theorem, which is the “easy” direction, and which we can prove here. Suppose (2) holds, and let μ be any *k*-wise independent distribution. The polynomial *f*_{l} is of degree ≤ *k*, which as observed in Section 2, implies that **E**_{μ}[*f*_{l}] = **E**[*f*_{l}]. Thus,

The first inequality uses the sandwiching property, and the last inequality uses the small error property. Similarly,

implying that μ ε-fools *F*. Thus a problem about fooling **AC**^{0} circuits is actually another problem on approximating circuits with polynomials!

Our actual proof of Lemma 5 will not produce a pair of sandwiching polynomials, but will “almost” produce such a pair. By the “(1)
(2)” direction of Lemma 6 we know that Theorem 1 implies that such a pair must exist for each **AC**^{0} function.

The proof will combine the two types of polynomial approximations that we discussed in Sections 3 and 4. When we produced the almost-everywhere-correct polynomials in Section 4 we noted that when the polynomial *f* does disagree with *F*, the disagreement might be very large. It is still possible to give the following very crude bound on the maximum value ||*f*||_{∞} that |*f*| may attain on {0,1}^{n}:

CLAIM 7. *In Lemma 4, for s* ≥ log *m*

The claim follows by an easy induction from the proof of Lemma 4. We omit the proof here.

A low-degree polynomial *f*_{0} is a *one-sided* approximation of a boolean *F* (see Figure 4(b)) if:

*f*_{0}**is a good approximation**: ||*F – f*_{0}||^{2}_{2}is small.*f*_{0}‘s**error is one-sided**:*f*_{0}(*x*) = 0 whenever*F*(*x*) = 0.

If *F* had such an approximation, then the polynomial *f*_{l}:=1 – (1 – *f*_{0})^{2} would be a (lower) sandwiching polynomial for *F.* *f*_{l} still has a low degree, and

is small. This process is illustrated in Figure 4(c).

Thus, being able to produce one-sided approximations (that combine characteristics from both Sections 3 and 4 approximations) would be enough. Unfortunately, we are unable to construct such polynomials. Instead, we show how to modify *F* just a little bit to obtain a boolean function *F*‘. The change is slight enough with respect to both μ and the uniform measure so that fooling *F*‘ and fooling *F* is almost equivalent. We then show that the modified *F*‘ does have a one-sided approximation *f*‘. The properties of *F*‘ and *f*‘ are summarized in the following lemma:

LEMMA 8. *Let F be computed by a circuit of depth d and size m. Let s*_{1}, *s*_{2} *be two parameters with s*_{1} · *log m. Let μ be any probability distribution on* {0, 1}^{n}, *and U*_{{0,1}}^{n} *be the uniform distribution on* {0,1}^{n}. *Set*

Let ε_{v} *be the function from Lemma 4 with s = s*_{1}. *Set F’ = F*
ε_{v}. *Then there is a polynomial f’ of degree r*_{f} ≤ (*s*_{1} · log *m*)^{d} + *s*_{2}, *such that*

*F*≤*F*‘*on*{0, 1}^{n};**P**_{μ}[*F*≠*F*‘] < 2 · 0.82^{s}1*m*;- ||
*F*‘ –*f*‘||^{2}_{2}< ,*and* *f*‘ (*x*) = 0*whenever F*‘ (*x*) = 0.

PROOF IDEA: The proof is illustrated on Figure 5. We start with a polynomial approximation *f* for *F* that works “almost everywhere”. By Lemma 4 we know that there is an **AC**^{0} function ε_{v} that “flags” the locations of all the errors. We fix *F*‘= *F*
ε_{v} so that *f* = 0 whenever *F*‘ = 0. The problem is that the error |*F*‘ – *f*|^{2} may be large in the area where ε_{v} = 1 (and thus *f* behaves poorly). If we could multiply *f*‘ = *f*‘ *f* · (1 – ε_{v}), we would be done, as this would “kill” all the locations where ε_{v} is 1. Unfortunately, we cannot do this since ε_{v} is not a low degree polynomial. Here we use the fact that ε_{v} is itself an **AC**^{0} function, and thus can be approximated well by a polynomial
by the LinialMansourNisan approximation (Lemma 2). A brief calculation indeed shows that *f*‘ = *f*^{.} (1 –
) for an appropriately chosen
satisfies all the conditions.

PROOF. The first property follows from the definition of *F*‘. The second one follows from Lemma 4 directly, since

Note also that

Let *f* be the approximating polynomial for *F* from that lemma, so that *F* = *F*‘ = *f* whenever ε_{v} = 0, and thus *f* = 0 whenever *F*‘ = 0. By Proposition 7 we have

We let
be the low degree approximation of ε_{v} of degree *s*_{2}. By Linial et al.^{8} (Lemma 2 above), we have

Let

Then *f*‘ = 0 whenever *F*‘ = 0. It remains to estimate ||*F*‘ – *f*||^{2}_{2}:

which completes the proof.

We can now use Lemma 8 to give each **AC**^{0} function *F* a lower sandwiching polynomial, at least after a little “massaging”:

LEMMA 9. *For every boolean circuit F of depth d and size m and any s* ≥ log *m, and for any probability distribution μ on* {0, 1} *there is a boolean function F’ and a polynomial
of degree less than*

*such that*

*where ε(s, d)* = 0.82^{s} · (10*m*).

PROOF. Let *F*‘ be the boolean function and let *f*‘ be the polynomial from Lemma 8 with *s*_{1} = *s* and *s*_{2} ≈ 60^{d+3} · (log *m*) ^{(d+1)(d+3)}.*S*^{d(d+3)}. The first two properties follow directly from the lemma. Set

It is clear that
≤ 1 and moreover
= 0 whenever *F*‘ = 0, hence
≤ *F*‘. Finally, *F*‘(*x*) – *f*‘_{l}(*x*) = 0 when *F*‘(*x*) = 0, and is equal to

when *F*‘(*x*) = 1, thus

by Lemma 8. To finish the proof we note that the degree of is bounded by

Lemma 9 implies the following:

LEMMA 10. *Let s* ≥ log *m be any parameter. Let F be a boolean function computed by a circuit of depth d and size m. Let μ be an r-independent distribution where*

*then*

*where* ε(*s, d*) = 0.82^{s} · (10*m*).

PROOF. Let *F*‘ be the boolean function and let
be the polynomial from Lemma 9. The degree of
is < *r.* We use the fact that since μ is *r*-independent, **E**_{μ}[
] = **E**[
] (since *k*-wise independence fools degree-*k* polynomials, as discussed above):

The dual inequality to Lemma 10 follows immediately by applying the lemma to the negation
= 1 – *F* of *F.* We have **E**_{μ}[
] > **E**[
] – ε(*s, d*), and thus

Together, these two statements yield Lemma 5.

### Figures

Figure 1. An illustration of the statement of Lemma 2. For convenience, the boolean cube is represented by the real line. The function *F* (in gray) is an AC^{0} boolean function. The function
(in black) is a real-valued low-degree polynomial approximating *F* well *on average*.

Figure 2. An illustration of the statement of Lemma 3 (a), and Lemma 4 (b). Note that when the low degree polynomial *f* does disagree with *F* we have no good guarantee on the error. This means that *f* may be a good approximant almost everywhere but not on average.

Figure 3. An illustration of the inductive step where an approximating polynomial *f* is constructed from the approximating polynomials *g*_{1}, *g*_{2},…, *g*_{k}.

Figure 4. An illustration of sandwiching polynomials (a) and one-sided approximations (b)(c).

## Join the Discussion (0)

## Become a Member or Sign In to Post a Comment