The question of determining which (weak) forms of randomness “fool” (or seem totally random to) a given algorithm is a basic and fundamental question in the modern theory of computer science. In this work we report progress on this question by showing that any “k-wise independent” collection of random bits, for some k = (log n)O(1), fool algorithms computable by bounded depth circuits. In the process we also present known tight connections between bounded-depth circuits and low-degree multivariate polynomials. We establish a new such connection to prove our main result.
In the rest of this section, we introduce the basic concepts in greater detail so as to present a precise version of our main result.
A boolean circuit C is a circuit that is comprised of boolean inputs, boolean outputs, and gates that perform operations on the intermediate values within the circuit. The circuit is not allowed to have loops. In other words, the gates of the circuit form a directed acyclic graph. A circuit C with n input bits and one output naturally gives rise to a boolean function FC: {0, 1}n → {0, 1}. Depending on the type of the circuit, various restrictions may be placed on the size of the circuit, its shape, and the types of gates that it may use. In this paper, we will focus on circuits where unbounded fan-in AND and OR gates, as well as NOT gates are allowed.
Since any given circuit accepts a fixed number n of inputs, it can only compute a function over the set {0, 1}n of strings of length n. If we want to discuss computing a function F: {0, 1}* → {0, 1} that takes strings of arbitrary length using circuits, we need to consider families of circuits parameterized by input size. A family of circuits computes the function F if each circuit Cn computes the restriction of F to strings of length n.
The two most important measures of a circuit’s “power” are size and depth. Circuit size is the total number of gates used in constructing the circuit. For circuits with AND, OR, and NOT gates the depth is defined as the largest number of AND and OR gates a signal needs to traverse from any input to the output. The same measures can be applied to families of circuits. A family {Cn} is of polynomial size if the size of each Cn is bounded by nc for some constant c. Circuit complexity studies the amount of resources (such as size, depth) required to compute various boolean functions.
In this context, studying circuits of bounded depth—i.e., using at most a constant d number of layers—is of particular interest. The complexity class capturing functions computed by boolean circuits of bounded depth and polynomial size is denoted by AC0. Thus AC0 captures what can be computed by polynomial size circuits in constant time. The class AC0 has been studied extensively in the past three decades.
There are several reasons why AC0 circuits have been studied so extensively. Firstly, AC0 is essentially the only class of circuits for which strong unconditional lower-bounds have been proved: it is known, for example, that any AC0 circuit computing the parity function PARITY (x1,…,xn) = Σxi mod 2 must be of size exponential in n.5,6 Secondly, there is a very tight connection between computations performed by the polynomial hierarchy (PH) complexity class relative to an oracle and computations by AC0 circuits. Thus better understanding of AC0 translates into a better understanding of the polynomial hierarchy. Finally, the class of AC0 functions, and its subclass of DNF formulas has been the target of numerous efficient machine learning algorithms. It is actually because AC0 is a relatively weak class that functions in this class are amenable to efficient learning: it can be shown that learning functions from larger circuit complexity classes would allow one to break cryptographic primitives such as integer factoring.
1.2 Pseudorandomness: fooling bounded depth circuits
Randomness is one of the most important computational resources. A randomized algorithm A may make use of a string of random bits r ε {0, 1}n as part of the computation. The randomness may be used, for example, for Monte Carlo simulations, or for generating random hashes. Denote by
the uniform distribution on boolean strings of length n. Then our randomized algorithm A is executed with r being sampled from U. In reality, truly random samples are virtually impossible to obtain, and we resort to pseudorandom distributions and functions that generate pseudorandom bits, such as the rand
function in C. What makes a distribution μ over n-bit strings pseudorandom? Turns out that the answer to this question depends on the algorithm A. The distribution μ is pseudorandom for A, if the behavior of A on samples from μ is indistinguishable from its behavior on truly uniformly random r. In particular, if A was likely to work correctly with truly uniform samples, it will be likely to work correctly with samples drawn from μ.
For simplicity, suppose that A outputs a single bit. For a distribution μ on length-n strings {0, 1}n, we denote by 0 ≤ Eμ[A] ≤ 1 the expected value of F on inputs drawn according to μ. When the distribution under consideration is the uniform distribution U on {0, 1}n, we suppress the subscript and let E[A] denote the expected value of A. A distribution μ is ε-pseudorandom, or ε-fools A if the expected output when A is fed samples from μ is ε-close to the expected output when it is fed truly uniform samples:
Thus, when we use the rand
function in our code, we implicitly hope that its output ε-fools our program.
Similarly to fooling one algorithm, we can define fooling a class of functions. In particular, a distribution μ ε-fools AC0 functions if it ε-fools every function in the class.
The smaller (and weaker) the class of functions is, the easier it is to fool. Since AC0 is a relatively weak class of functions, there is hope of fooling them using a distribution μ or a relatively low entropy. In the late 1980s Nisan10 demonstrated a family of such distributions. In the present paper we will give a very general class of distributions that fool AC0 circuits, showing that all k-wise independent distributions with k = logO(1) n fool AC0 circuits.
1.3 k-wise independence and the main result
A distribution μ on {0,1}n is k-independent if every restriction of μ to k coordinates is uniform on {0, 1}k. Clearly, the uniform distribution on {0, 1}n is n-independent. A simple example of a distribution that is (n – 1)-independent but not uniform is
where the bits x1,…,xn−1 are selected uniformly at random and is the XOR operator. Equivalently, μ selects a uniformly random string in {0, 1}n subject to the condition that the number of 1’s selected is even.
k-Wise independent distributions can be sometimes used as pseudorandom generators. If μ is a k-wise independent distribution, it may require fewer than n bits of true randomness to sample from. For example, the distribution μ only requires n – 1 truly random bits to sample from. Alon et al.,1 building on ideas from Joffe7 give a simple construction of a k-wise independent distribution which only requires O(k log n) truly random bits using univariate polynomials.
Another way of looking at and constructing k-wise independent distributions is using coding theory. Simple linear codes give rise to a large family of k-wise independent distributions. For any binary code C with distance > k, a uniform distribution on the set
gives rise to a k-wise independent distribution. The example μ above corresponds to the n-times repetition code C = {00 … 0, 11 … 1} whose distance is n, again showing that μ is (n – 1)-wise independent.
For which k‘s does k-wise independence fool AC0 circuits? It is not hard to see that the problem of distinguishing μ from the uniform distribution is as hard as computing PARITY, and thus it is not surprising that μ fools AC0 circuits. If a distribution μ is k-wise independent, it means that “locally” the distribution looks uniform, and a circuit distinguishing μ from the uniform distribution needs to be able to “process” more than k input bits together. Thus, intuitively, k-wise independence should be able to fool AC0 circuits even for fairly small values of k. In 1990, Linial and Nisan9 conjectured that k = (log m)d – 1 (i.e., poly-logarithmic in m) is sufficient to fool AC0 circuits of size m and depth d. The main object of this paper is to give a proof of the following theorem, that was first published in 20093:
THEOREM 1. r(m, d, ε)-independence ε-fools depth-d AC0 circuits of size m, where
Theorem 1 shows that poly-logarithmic independence suffices to fool all AC0 circuits, thus settling the LinialNisan conjecture. A gap remains in the exact dependence of the exponent of the log on d. Prior to Theorem 1, the conjecture had been proved by Bazzi2 in 2007 for the special case of DNF formulas. Bazzi’s original proof was quite involved, and was later greatly simplified by Razborov.12 No nontrivial bound on r(m, d, ε) was known for d > 2.
2. Proof Overview
The main technical ingredient in our work is presenting a new connection between low-degree polynomials and AC0 circuits. Connections between polynomials and circuits, especially AC0 circuits, have been explored in the past in various contexts, and remain perhaps the most powerful tool for analyzing these functions.
We begin by observing that k-wise independent distributions perfectly fool degree-k polynomials. Let μ be any k-wise independent distribution over {0, 1}n. As a first step, let be a degree-k monomial. m may depend on at most k different variables, on which μ (being k-wise independent) appears to be perfectly uniformly random. Thus
Now, if f is a degree-k polynomial over x1,…, xn, it can be written as a sum of degree-k monomials: f = ΣNj=1mj. Each monomial is completely fooled by μ and thus
In light of (1), a reasonable attack strategy to prove Theorem 1 would be as follows. For a function F computed by an AC0 circuit, find a polynomial f that approximates F such that:
- E[F] ≈ E[f], i.e., f approximates F well on average;
- Eμ[F] ≈ Eμ[f], i.e., f approximates F well on average, even when restrict ourselves to inputs drawn according to μ.
Then use (1) to conclude that
i.e., that μ ε-fools F. Of course, it is not clear at all that such a polynomial must exist (and we do not know whether it does). However, a carefully crafted attack along these lines actually does go through. The proof combines two previous constructions of approximating polynomials for AC0 circuits.
The first construction is due to Linial, Mansour, and Nisan8 and is discussed extensively in Section 3. It is an analytic construction that for each AC0 function F provides a low-degree polynomial that approximates it well on average. Such a polynomial would easily fulfill requirement 1 above. However, it would not be helpful at fulfilling requirement 2. Since the support of μ is quite small, being close on average does not rule out the possibility that F and deviate wildly from each other on the points of the support of the distribution μ. This type of uniform average-case approximation is inherently useless when we want F to be close to on samples drawn according to μ.
The second construction is a combinatorial one, and is due to Razborov11 and Smolensky.13 It is discussed in Section 4. For an AC0 function F, and any distribution of inputs v, the RazborovSmolensky construction gives a low-degree polynomial f such that F = f with high probability with respect to v. In other words,
Note that here f is actually equal to F most of the time (and is not just close to it). Since v is allowed to be any distribution, we can take
to be the hybrid between μ and the uniform distribution, thus guaranteeing that the probability that f(x) ≠ F(x) is small simultaneously with respect to both distributions. This seems to be a promising improvement over the LinialMansourNisan polynomials, that may allow us to fulfill both requirements 1 and 2 above.
The caveat here is that on the few locations where f(x) ≠ F(x) the deviation |f(x) – F(x)| may be huge. In particular, while f is close to F (more precisely, it is perfectly equal) almost everywhere, it is not necessarily the case that they are close on average. Thus, for example, E[f] may differ wildly from E[F].
We see that both polynomial approximations get us “part-way” toward the goals, but do not quite work. To make the proof work we need to combine the two approaches as described in Section 5. In a nutshell, we first take a RazborovSmolensky polynomial f that agrees with F almost everywhere. It turns out that an “error” function ε(x) that “flags” the problematic locations, i.e., ε(x) = 1 whenever f(x) ≠ F(x), can be computed by (a slightly bigger) AC0 circuit. Thus, it the function f*(x) = (1 – ε(x)) · f(x) were a polynomial, we would be in great shape: f*(x) = F(x) most of the time, while even when they disagree, the difference |f*(x) – F(x)| ≤ 1, and thus in fact E[f*] ≈ E[F] and Eμ[f*] ≈ Eμ[F].
Of course, f* is not a polynomial, since ε is not a polynomial. However, it turns out with a little bit of work that the polynomial f’ = (1 – ) · f, where is the LinialMansourNisan approximation of ε actually allows us to prove the main theorem. Thus the proof comes out of combining the two approximation techniques in one polynomial.
3. Low-Depth Circuits and Polynomials—The Analytic Connection
Approximately a decade after the first AC0 lower bounds were published, Linial, Mansour, and Nisan proved the following remarkable lemma:
LEMMA 2. If F: {0, 1}n → {0, 1} is a boolean function computable by a depth-d circuit of size m, then for every t there is a degree t polynomial with8
The lemma states that any low-depth circuit of size m can be approximated very well by a polynomial of degree (log m)O(d)—a degree that is poly-logarithmic in the size of the circuit. The approximation error here is the average-square error: doesn’t have to agree with F on any point, but the average value of |F(x) – f(x)|2 when taken over the Hamming cube is small. An illustration of the statement of the lemma can be found in Figure 1.
To understand the context of Lemma 2 we need to consider one of the most fundamental tools in the analysis of boolean functions: the Fourier transform, which we will briefly introduce here. An introductory survey on the Fourier transform over the boolean cube can be found in de Wolf.4 For the remainder of this section we will represent the boolean function F: {0, 1}n → {0,1} using a function G: {-1, +1}n → {-1, +1} with 0 corresponding to −1 and 1 to +1. Thus
This will not affect any of the results, but will make all the calculations much cleaner. Note that a degree-t polynomial approximation of F corresponds to a degree-t polynomial approximation for G and vice-versa.
Each function G: {-1, +1}n → {-1, +1} can be viewed as a 2n-dimensional vector in specified by its truth table. Consider the special parity functions χs: {-1, +1}n → {-1, +1}. For each set S ⊂ {1,…, n} of coordinates, let
In other words, χs is the parity of the x‘s that correspond to coordinates in the set S. There is a total of 2n different functions χs—one corresponding to each subset S of coordinates. The function χφ is the constant function χφ(x) = 1, while the function χ{1,…, n}(x) outputs the parity of all the coordinates in x.
Similarly to the function G, each function χs can also be viewed as a vector in specified by the values it takes on all possible inputs. By abuse of notation we will denote these vectors by χs as well. Moreover, for each two sets S1 ≠ S2 the vectors and are orthogonal to each other. That is,
Thus we have 2n orthogonal vectors {χs} in , and when properly normalize they yield an orthonormal basis of the space. In other words, each vector in can be uniquely represented as a linear combination of the functions χs. In particular, the function G can be uniquely written as:
where the numerical coefficients are given by the inner product
The representation of G given by (2) is called the Fourier transform of G. The transform converts 2n numbers (the truth table of G) into 2n numbers (the coefficients (S)), thus preserving all the information about G. Another way to view the representation (2) is as a canonical way of representing G as a multilinear polynomial with real-valued coefficients:
Thus each function G can be uniquely represented as a degree-n multilinear polynomial (a simple fact that can be proved directly, without using the Fourier transform). More importantly, when we view the functions χs as vectors, the space Ht of polynomials of degree ≤ t, for any t, is spanned by the functions {χs: |S| ≤ t}. This means that to get the best low-degree approximation for G we should simply project it onto Ht, to obtain
Note that is no longer a boolean function, and may take arbitrary real values, even on inputs from {-1, +1}n. The l2-error—which we need to bound to prove Lemma 2—is given by
To bound (5) one needs to show that all but a very small fraction of weight in the Fourier representation of G is concentrated on low-degree coefficients. This is where the magic of Linial et al.8 happens, and this is where the fact that G is computed by an AC0 circuit is being used.
At a very high level, the proofs uses random restrictions. A random restriction ρ assigns the majority of G‘s inputs a random value 0 or 1, while leaving some inputs unset. A tool called the Switching Lemma is then used to show that when a random restriction ρ is applied to and AC0 function G, the resulting boolean function G|ρ is highly likely to be very “simple”—depending on only few of the inputs. On the other hand, a random restriction applied to χs where |S| is large, is likely to leave some of the inputs in S unset. Thus if G had a lot of weight on the coefficients (S) with large |S|, G|ρ would be likely to remain “complicated” even after the application of ρ.
Lemma 2 immediately implies a lower bound for an AC0 circuit computing PARITY, since the Fourier representation of PARITY is χs(PARITY) = 1 for S = [n], and 0 otherwise. Thus PARITY cannot be approximated well by a polynomial even of degree t = n – 1.
One interesting application of Lemma 2 that is outside of the scope of the present article is for learning AC0 functions. Suppose that G is an unknown AC0 function of size m, and we are given access to examples of the form (x, G(x)), where x is chosen uniformly from {-1, +1}n. Then we know that G has a good polynomial approximation of the form (4). Thus to (approximately) learn G, all we have to do is learn the coefficients (S), where S is small. This can be done in time proportional to 2|S|, immediately yielding an algorithm for learning AC0 circuits.
4. Low-Depth Circuits and Polynomials—The Algebraic Connection
It turns out that average-error approximations are not the only useful way in which AC0 circuits can be approximated by low-degree polynomials. We have seen in the previous section that the representation of any boolean function by a multilinear polynomial is unique, and thus given an AC0 function F we cannot hope to obtain a low-degree polynomial f that agrees with F everywhere. We will see, however, that it is possible to construct a low-degree polynomial f that agrees with F almost everywhere. Specifically, the following statement holds:
LEMMA 3. Let v be any probability distribution on {0, 1}n. For a circuit of depth d and size m computing a function F, for any s, there is a degree r = (s · log m)d polynomial f such that Pv[f(x) < F(x)] < 0.82sm.
Note that Lemma 3 promises an approximating polynomial, again of degree (log m)O(d), that has a low error against an arbitrary distribution v on inputs. The statement of the lemma is illustrated on Figure 2(a), where the probability according to v of the region of disagreement between f and F is small.
Lemma 3 (or its slight variation) has been proved by Razborov11 and Smolensky13 in the late 1980s. The tools for the proof of the lemma have been developed in Valiant and Vazirani.14 Razborov and Smolensky used the lemma to give stronger lower bounds on bounded depth circuits. Let AC0[p] ⊃ AC0 be the class of functions computable using a bounded-depth circuit that in addition to the AND and OR gates is allowed to use the MODp gate: the gate outputs 1 if and only if the number of 1’s among its inputs is divisible by p. We already know that PARITY ∉ AC0, and thus AC0[2] AC0. Razborov and Smolensky showed that the MODp function cannot be computed efficiently by an AC0[q] circuit where q ≠ p. In particular, this means that PARITY ∉ AC0[3].
The proof of Lemma 3 is combinatorial, and is much simpler than the proof of Lemma 2. In fact, we will be able to present its entire proof below. To obtain the results in Section 5 we will need a slight strengthening of the lemma.
LEMMA 4. Let v be any probability distribution on {0, 1}n. For a circuit of depth d and size m computing a function F, for any s, there is a degree r = (s · log m)d polynomial f and a boolean function εv computable by a circuit of depth ≤ d + 3 and size O(m2r) such that
- Pv[εv(x) = 1] < 0.82sm, and
- whenever εv(x) = 0, f(x) = F(x).
Thus, not only does the polynomial from Lemma 3 exist, but there is a simple AC0 “error circuit” εv that given an input can tell us whether f will make an error on the input. The function εv is shown on Figure 2(b).
We will now prove Lemma 4. A curious property of the construction is that it gives a probabilistic algorithm for producing the low degree polynomial f that approximates F almost everywhere.
PROOF. (of Lemma 4) We construct the polynomial f by induction on the depth d of F, and show that with high probability f = F. The function εv follows from the construction. Note that we do not know anything about the measure v and thus cannot give an explicit construction for f. Instead, we will construct a distribution on polynomials f that succeeds with high probability on any given input. Thus the distribution is expected to have a low error with respect to v, which implies that there is a specific f that has a small error with respect to v.
We will show how to make a step when the output gate in f is an AND gate (see Figure 3). Since the whole construction is symmetric with respect to 0 and 1, the step also holds for an OR gate. Let
where k < m. For convenience, let us assume that k = 2l is a power of 2. We take a collection of
random subsets of {1, 2,…, k} where each element is included with probability p independently of the others: at least s subsets for each of the p = 2−1, 2−2,…, 2-c = 1/k. Denote the sets by S1,…, St—we ignore empty sets. In addition, we make sure to include {1,…, k} as one of the sets. Let g1,…, gk be the approximating polynomials for G1,…, Gk that are guaranteed by the induction hypothesis applied to G1,…, Gk with depth d – 1. We set
By the induction assumption, the degrees of gj are degg ≤ (s · log m)d−1, hence the degree of f is bounded by degf ≤ t · degg ≤ (s · log m)d. Next we bound the error P[f ≠ F]. It consists of two terms:
In other words, to make a mistake, either one of the inputs to the final AND gate has to be wrong, or the approximating function for the AND has to make a mistake. We will focus on the second term. The first term is bounded by union bound. We fix a vector of specific values G1(x),…, Gk(x), and calculate the probability of an error over the possible choices of the random sets Si.
If all the Gj(x)’s are 1 then the value of F(x) = 1 is calculated correctly with probability 1.
Suppose that F(x) = 0 (and thus at least one of the Gj(x)‘s is 0). Let 1 ≤ z ≤ k be the number of zeros among G1(x),…, Gk(x), and α be such that 2α ≤ z < 2α+1. Our formula will work correctly if one of the sets Si hits exactly one 0 among the z zeros of G1(x),…, Gk(x). We will consider only the sets Si above that are likely to hit exactly one zero. Specifically, let S be a random set as above with p = 2-α-1. The probability of S hitting exactly one zero is exactly
Hence the probability of the formula being wrong after s such sets is bounded by 0.82s. Since this is true for any value of x, we can find a collection of sets Si such that the probability of error as measured by v is at most 0.82s. By making the same probabilistic argument at every node and applying the union bound, we get that the condition “if the inputs are correct then the output is correct” is satisfied by all nodes except with probability < 0.82sm. Thus the error of the polynomial is < 0.82sm.
Finally, if we know the sets Si at every node, it is easy to check whether there is a mistake by checking that no set contains exactly one 0, thus yielding the depth ≤ (d + 3) function εv. The blowup in size is at most O(mr) since at each node we take a disjunction over all the possible pairs of (Si, Gj Si) of whether Gj is the only 0 in the set Si.
5. K-Wise Independent Distributions and Low-Depth Circuits
Next we turn our attention to proving the main result, Theorem 1. The result will give another way in which AC0 circuits can be approximated using low-degree polynomials.
THEOREM 1. r(m, d, ε)-independence ε-fools depth-d AC0 circuits of size m, where
To prove Theorem 1 we will show:
LEMMA 5. Let s ≥ log m be any parameter. Let F be a boolean function computed by a circuit of depth d and size m. Let μ be an r-independent distribution where
then
where ε(s, d) = 0.82s · (10m).
Theorem 1 follows from Lemma 5 by taking .
As mentioned in the overview, one can prove that k-wise independence fools a function F by constructing an appropriate approximation of F with low degree polynomials. Bazzi2 has given the following equivalent characterization of fooling through polynomial approximations using linear programming duality:
LEMMA 6. Let F: {0, 1}n → {0, 1} be a boolean function, k ≥ 0 an integer, and ε > 0. Then the following are equivalent:2
- Any k-wise independent distribution ε-fools F.
- there exist a pair of “sandwiching polynomials” fl, fu: {0, 1}n → R such that:
- low degree: deg(fl), deg(fu) ≤ k;
- sandwiching: fl ≤ F ≤ fu on {0, 1}n;
- small error: E[F – fl], E[fu – F] ≤ ε, where the expectation is with respect to the uniform distribution on {0, 1}n.
The sandwiching polynomials are illustrated on Figure 4(a). Since part (1) in Lemma 6 is what we need to prove, we will only be interested in the “(2) (1)” direction of the theorem, which is the “easy” direction, and which we can prove here. Suppose (2) holds, and let μ be any k-wise independent distribution. The polynomial fl is of degree ≤ k, which as observed in Section 2, implies that Eμ[fl] = E[fl]. Thus,
The first inequality uses the sandwiching property, and the last inequality uses the small error property. Similarly,
implying that μ ε-fools F. Thus a problem about fooling AC0 circuits is actually another problem on approximating circuits with polynomials!
Our actual proof of Lemma 5 will not produce a pair of sandwiching polynomials, but will “almost” produce such a pair. By the “(1) (2)” direction of Lemma 6 we know that Theorem 1 implies that such a pair must exist for each AC0 function.
The proof will combine the two types of polynomial approximations that we discussed in Sections 3 and 4. When we produced the almost-everywhere-correct polynomials in Section 4 we noted that when the polynomial f does disagree with F, the disagreement might be very large. It is still possible to give the following very crude bound on the maximum value ||f||∞ that |f| may attain on {0,1}n:
CLAIM 7. In Lemma 4, for s ≥ log m
The claim follows by an easy induction from the proof of Lemma 4. We omit the proof here.
A low-degree polynomial f0 is a one-sided approximation of a boolean F (see Figure 4(b)) if:
- f0 is a good approximation: ||F – f0||22 is small.
- f0‘s error is one-sided: f0(x) = 0 whenever F(x) = 0.
If F had such an approximation, then the polynomial fl:=1 – (1 – f0)2 would be a (lower) sandwiching polynomial for F. fl still has a low degree, and
is small. This process is illustrated in Figure 4(c).
Thus, being able to produce one-sided approximations (that combine characteristics from both Sections 3 and 4 approximations) would be enough. Unfortunately, we are unable to construct such polynomials. Instead, we show how to modify F just a little bit to obtain a boolean function F‘. The change is slight enough with respect to both μ and the uniform measure so that fooling F‘ and fooling F is almost equivalent. We then show that the modified F‘ does have a one-sided approximation f‘. The properties of F‘ and f‘ are summarized in the following lemma:
LEMMA 8. Let F be computed by a circuit of depth d and size m. Let s1, s2 be two parameters with s1 · log m. Let μ be any probability distribution on {0, 1}n, and U{0,1}n be the uniform distribution on {0,1}n. Set
Let εv be the function from Lemma 4 with s = s1. Set F’ = F εv. Then there is a polynomial f’ of degree rf ≤ (s1 · log m)d + s2, such that
PROOF IDEA: The proof is illustrated on Figure 5. We start with a polynomial approximation f for F that works “almost everywhere”. By Lemma 4 we know that there is an AC0 function εv that “flags” the locations of all the errors. We fix F‘= F εv so that f = 0 whenever F‘ = 0. The problem is that the error |F‘ – f|2 may be large in the area where εv = 1 (and thus f behaves poorly). If we could multiply f‘ = f‘ f · (1 – εv), we would be done, as this would “kill” all the locations where εv is 1. Unfortunately, we cannot do this since εv is not a low degree polynomial. Here we use the fact that εv is itself an AC0 function, and thus can be approximated well by a polynomial by the LinialMansourNisan approximation (Lemma 2). A brief calculation indeed shows that f‘ = f. (1 – ) for an appropriately chosen satisfies all the conditions.
PROOF. The first property follows from the definition of F‘. The second one follows from Lemma 4 directly, since
Note also that
Let f be the approximating polynomial for F from that lemma, so that F = F‘ = f whenever εv = 0, and thus f = 0 whenever F‘ = 0. By Proposition 7 we have
We let be the low degree approximation of εv of degree s2. By Linial et al.8 (Lemma 2 above), we have
Let
Then f‘ = 0 whenever F‘ = 0. It remains to estimate ||F‘ – f||22:
which completes the proof.
We can now use Lemma 8 to give each AC0 function F a lower sandwiching polynomial, at least after a little “massaging”:
LEMMA 9. For every boolean circuit F of depth d and size m and any s ≥ log m, and for any probability distribution μ on {0, 1} there is a boolean function F’ and a polynomial of degree less than
such that
where ε(s, d) = 0.82s · (10m).
PROOF. Let F‘ be the boolean function and let f‘ be the polynomial from Lemma 8 with s1 = s and s2 ≈ 60d+3 · (log m) (d+1)(d+3).Sd(d+3). The first two properties follow directly from the lemma. Set
It is clear that ≤ 1 and moreover = 0 whenever F‘ = 0, hence ≤ F‘. Finally, F‘(x) – f‘l(x) = 0 when F‘(x) = 0, and is equal to
when F‘(x) = 1, thus
by Lemma 8. To finish the proof we note that the degree of is bounded by
Lemma 9 implies the following:
LEMMA 10. Let s ≥ log m be any parameter. Let F be a boolean function computed by a circuit of depth d and size m. Let μ be an r-independent distribution where
then
where ε(s, d) = 0.82s · (10m).
PROOF. Let F‘ be the boolean function and let be the polynomial from Lemma 9. The degree of is < r. We use the fact that since μ is r-independent, Eμ[ ] = E[ ] (since k-wise independence fools degree-k polynomials, as discussed above):
The dual inequality to Lemma 10 follows immediately by applying the lemma to the negation = 1 – F of F. We have Eμ[ ] > E[ ] – ε(s, d), and thus
Together, these two statements yield Lemma 5.
Figures
Figure 1. An illustration of the statement of Lemma 2. For convenience, the boolean cube is represented by the real line. The function F (in gray) is an AC0 boolean function. The function (in black) is a real-valued low-degree polynomial approximating F well on average.
Figure 2. An illustration of the statement of Lemma 3 (a), and Lemma 4 (b). Note that when the low degree polynomial f does disagree with F we have no good guarantee on the error. This means that f may be a good approximant almost everywhere but not on average.
Figure 3. An illustration of the inductive step where an approximating polynomial f is constructed from the approximating polynomials g1, g2,…, gk.
Figure 4. An illustration of sandwiching polynomials (a) and one-sided approximations (b)(c).
Join the Discussion (0)
Become a Member or Sign In to Post a Comment