Systems and Networking Research highlights

Poly-Logarithmic Independence Fools Bounded-Depth Boolean Circuits

Posted Apr 1 2011

1. Introduction
2. Proof Overview
3. Low-Depth Circuits and PolynomialsThe Analytic Connection
4. Low-Depth Circuits and PolynomialsThe Algebraic Connection
5. K-Wise Independent Distributions and Low-Depth Circuits
References
Author
Footnotes
Figures

The question of determining which (weak) forms of randomness “fool” (or seem totally random to) a given algorithm is a basic and fundamental question in the modern theory of computer science. In this work we report progress on this question by showing that any “k-wise independent” collection of random bits, for some k = (log n)^O(1), fool algorithms computable by bounded depth circuits. In the process we also present known tight connections between bounded-depth circuits and low-degree multivariate polynomials. We establish a new such connection to prove our main result.

In the rest of this section, we introduce the basic concepts in greater detail so as to present a precise version of our main result.

1.1 Bounded depth circuits

A boolean circuit C is a circuit that is comprised of boolean inputs, boolean outputs, and gates that perform operations on the intermediate values within the circuit. The circuit is not allowed to have loops. In other words, the gates of the circuit form a directed acyclic graph. A circuit C with n input bits and one output naturally gives rise to a boolean function F_C: {0, 1}ⁿ → {0, 1}. Depending on the type of the circuit, various restrictions may be placed on the size of the circuit, its shape, and the types of gates that it may use. In this paper, we will focus on circuits where unbounded fan-in AND and OR gates, as well as NOT gates are allowed.

Since any given circuit accepts a fixed number n of inputs, it can only compute a function over the set {0, 1}ⁿ of strings of length n. If we want to discuss computing a function F: {0, 1}* → {0, 1} that takes strings of arbitrary length using circuits, we need to consider families of circuits parameterized by input size. A family of circuits computes the function F if each circuit C_n computes the restriction of F to strings of length n.

The two most important measures of a circuit’s “power” are size and depth. Circuit size is the total number of gates used in constructing the circuit. For circuits with AND, OR, and NOT gates the depth is defined as the largest number of AND and OR gates a signal needs to traverse from any input to the output. The same measures can be applied to families of circuits. A family {C_n} is of polynomial size if the size of each C_n is bounded by n^c for some constant c. Circuit complexity studies the amount of resources (such as size, depth) required to compute various boolean functions.

In this context, studying circuits of bounded depth—i.e., using at most a constant d number of layers—is of particular interest. The complexity class capturing functions computed by boolean circuits of bounded depth and polynomial size is denoted by AC⁰. Thus AC⁰ captures what can be computed by polynomial size circuits in constant time. The class AC⁰ has been studied extensively in the past three decades.

There are several reasons why AC⁰ circuits have been studied so extensively. Firstly, AC⁰ is essentially the only class of circuits for which strong unconditional lower-bounds have been proved: it is known, for example, that any AC⁰ circuit computing the parity function PARITY (x₁,…,x_n) = Σx_i mod 2 must be of size exponential in n.^5,6 Secondly, there is a very tight connection between computations performed by the polynomial hierarchy (PH) complexity class relative to an oracle and computations by AC⁰ circuits. Thus better understanding of AC⁰ translates into a better understanding of the polynomial hierarchy. Finally, the class of AC⁰ functions, and its subclass of DNF formulas has been the target of numerous efficient machine learning algorithms. It is actually because AC⁰ is a relatively weak class that functions in this class are amenable to efficient learning: it can be shown that learning functions from larger circuit complexity classes would allow one to break cryptographic primitives such as integer factoring.

1.2 Pseudorandomness: fooling bounded depth circuits

Randomness is one of the most important computational resources. A randomized algorithm A may make use of a string of random bits r ε {0, 1}ⁿ as part of the computation. The randomness may be used, for example, for Monte Carlo simulations, or for generating random hashes. Denote by the uniform distribution on boolean strings of length n. Then our randomized algorithm A is executed with r being sampled from U. In reality, truly random samples are virtually impossible to obtain, and we resort to pseudorandom distributions and functions that generate pseudorandom bits, such as the rand function in C. What makes a distribution μ over n-bit strings pseudorandom? Turns out that the answer to this question depends on the algorithm A. The distribution μ is pseudorandom for A, if the behavior of A on samples from μ is indistinguishable from its behavior on truly uniformly random r. In particular, if A was likely to work correctly with truly uniform samples, it will be likely to work correctly with samples drawn from μ.

For simplicity, suppose that A outputs a single bit. For a distribution μ on length-n strings {0, 1}ⁿ, we denote by 0 ≤ E_μ[A] ≤ 1 the expected value of F on inputs drawn according to μ. When the distribution under consideration is the uniform distribution U on {0, 1}ⁿ, we suppress the subscript and let E[A] denote the expected value of A. A distribution μ is ε-pseudorandom, or ε-fools A if the expected output when A is fed samples from μ is ε-close to the expected output when it is fed truly uniform samples:

Thus, when we use the rand function in our code, we implicitly hope that its output ε-fools our program.

Similarly to fooling one algorithm, we can define fooling a class of functions. In particular, a distribution μ ε-fools AC⁰ functions if it ε-fools every function in the class.

The smaller (and weaker) the class of functions is, the easier it is to fool. Since AC⁰ is a relatively weak class of functions, there is hope of fooling them using a distribution μ or a relatively low entropy. In the late 1980s Nisan¹⁰ demonstrated a family of such distributions. In the present paper we will give a very general class of distributions that fool AC⁰ circuits, showing that all k-wise independent distributions with k = log^O(1) n fool AC⁰ circuits.

1.3 k-wise independence and the main result

A distribution μ on {0,1}ⁿ is k-independent if every restriction of μ to k coordinates is uniform on {0, 1}^k. Clearly, the uniform distribution on {0, 1}ⁿ is n-independent. A simple example of a distribution that is (n – 1)-independent but not uniform is

where the bits x₁,…,x_n−1 are selected uniformly at random and is the XOR operator. Equivalently, μ selects a uniformly random string in {0, 1}ⁿ subject to the condition that the number of 1’s selected is even.

k-Wise independent distributions can be sometimes used as pseudorandom generators. If μ is a k-wise independent distribution, it may require fewer than n bits of true randomness to sample from. For example, the distribution μ only requires n – 1 truly random bits to sample from. Alon et al.,¹ building on ideas from Joffe⁷ give a simple construction of a k-wise independent distribution which only requires O(k log n) truly random bits using univariate polynomials.

Another way of looking at and constructing k-wise independent distributions is using coding theory. Simple linear codes give rise to a large family of k-wise independent distributions. For any binary code C with distance > k, a uniform distribution on the set

gives rise to a k-wise independent distribution. The example μ above corresponds to the n-times repetition code C = {00 … 0, 11 … 1} whose distance is n, again showing that μ is (n – 1)-wise independent.

For which k‘s does k-wise independence fool AC⁰ circuits? It is not hard to see that the problem of distinguishing μ from the uniform distribution is as hard as computing PARITY, and thus it is not surprising that μ fools AC⁰ circuits. If a distribution μ is k-wise independent, it means that “locally” the distribution looks uniform, and a circuit distinguishing μ from the uniform distribution needs to be able to “process” more than k input bits together. Thus, intuitively, k-wise independence should be able to fool AC⁰ circuits even for fairly small values of k. In 1990, Linial and Nisan⁹ conjectured that k = (log m)^{d – 1} (i.e., poly-logarithmic in m) is sufficient to fool AC⁰ circuits of size m and depth d. The main object of this paper is to give a proof of the following theorem, that was first published in 2009³:

THEOREM 1. r(m, d, ε)-independence ε-fools depth-d AC⁰ circuits of size m, where

Theorem 1 shows that poly-logarithmic independence suffices to fool all AC⁰ circuits, thus settling the LinialNisan conjecture. A gap remains in the exact dependence of the exponent of the log on d. Prior to Theorem 1, the conjecture had been proved by Bazzi² in 2007 for the special case of DNF formulas. Bazzi’s original proof was quite involved, and was later greatly simplified by Razborov.¹² No nontrivial bound on r(m, d, ε) was known for d > 2.

2. Proof Overview

The main technical ingredient in our work is presenting a new connection between low-degree polynomials and AC⁰ circuits. Connections between polynomials and circuits, especially AC⁰ circuits, have been explored in the past in various contexts, and remain perhaps the most powerful tool for analyzing these functions.

We begin by observing that k-wise independent distributions perfectly fool degree-k polynomials. Let μ be any k-wise independent distribution over {0, 1}ⁿ. As a first step, let be a degree-k monomial. m may depend on at most k different variables, on which μ (being k-wise independent) appears to be perfectly uniformly random. Thus

Now, if f is a degree-k polynomial over x₁,…, x_n, it can be written as a sum of degree-k monomials: f = Σ^N_j=1m_j. Each monomial is completely fooled by μ and thus

In light of (1), a reasonable attack strategy to prove Theorem 1 would be as follows. For a function F computed by an AC⁰ circuit, find a polynomial f that approximates F such that:

E[F] ≈ E[f], i.e., f approximates F well on average;
E_μ[F] ≈ E_μ[f], i.e., f approximates F well on average, even when restrict ourselves to inputs drawn according to μ.

Then use (1) to conclude that

i.e., that μ ε-fools F. Of course, it is not clear at all that such a polynomial must exist (and we do not know whether it does). However, a carefully crafted attack along these lines actually does go through. The proof combines two previous constructions of approximating polynomials for AC⁰ circuits.

The first construction is due to Linial, Mansour, and Nisan⁸ and is discussed extensively in Section 3. It is an analytic construction that for each AC⁰ function F provides a low-degree polynomial that approximates it well on average. Such a polynomial would easily fulfill requirement 1 above. However, it would not be helpful at fulfilling requirement 2. Since the support of μ is quite small, being close on average does not rule out the possibility that F and deviate wildly from each other on the points of the support of the distribution μ. This type of uniform average-case approximation is inherently useless when we want F to be close to on samples drawn according to μ.

The second construction is a combinatorial one, and is due to Razborov¹¹ and Smolensky.¹³ It is discussed in Section 4. For an AC⁰ function F, and any distribution of inputs v, the RazborovSmolensky construction gives a low-degree polynomial f such that F = f with high probability with respect to v. In other words,

Note that here f is actually equal to F most of the time (and is not just close to it). Since v is allowed to be any distribution, we can take

to be the hybrid between μ and the uniform distribution, thus guaranteeing that the probability that f(x) ≠ F(x) is small simultaneously with respect to both distributions. This seems to be a promising improvement over the LinialMansourNisan polynomials, that may allow us to fulfill both requirements 1 and 2 above.

The caveat here is that on the few locations where f(x) ≠ F(x) the deviation |f(x) – F(x)| may be huge. In particular, while f is close to F (more precisely, it is perfectly equal) almost everywhere, it is not necessarily the case that they are close on average. Thus, for example, E[f] may differ wildly from E[F].

We see that both polynomial approximations get us “part-way” toward the goals, but do not quite work. To make the proof work we need to combine the two approaches as described in Section 5. In a nutshell, we first take a RazborovSmolensky polynomial f that agrees with F almost everywhere. It turns out that an “error” function ε(x) that “flags” the problematic locations, i.e., ε(x) = 1 whenever f(x) ≠ F(x), can be computed by (a slightly bigger) AC⁰ circuit. Thus, it the function f*(x) = (1 – ε(x)) · f(x) were a polynomial, we would be in great shape: f*(x) = F(x) most of the time, while even when they disagree, the difference |f*(x) – F(x)| ≤ 1, and thus in fact E[f*] ≈ E[F] and E_μ[f*] ≈ E_μ[F].

Of course, f* is not a polynomial, since ε is not a polynomial. However, it turns out with a little bit of work that the polynomial f’ = (1 – ) · f, where is the LinialMansourNisan approximation of ε actually allows us to prove the main theorem. Thus the proof comes out of combining the two approximation techniques in one polynomial.

3. Low-Depth Circuits and Polynomials—The Analytic Connection

Approximately a decade after the first AC⁰ lower bounds were published, Linial, Mansour, and Nisan proved the following remarkable lemma:

LEMMA 2. If F: {0, 1}ⁿ → {0, 1} is a boolean function computable by a depth-d circuit of size m, then for every t there is a degree t polynomial with⁸

The lemma states that any low-depth circuit of size m can be approximated very well by a polynomial of degree (log m)^O(d)—a degree that is poly-logarithmic in the size of the circuit. The approximation error here is the average-square error: doesn’t have to agree with F on any point, but the average value of |F(x) – f(x)|² when taken over the Hamming cube is small. An illustration of the statement of the lemma can be found in Figure 1.

To understand the context of Lemma 2 we need to consider one of the most fundamental tools in the analysis of boolean functions: the Fourier transform, which we will briefly introduce here. An introductory survey on the Fourier transform over the boolean cube can be found in de Wolf.⁴ For the remainder of this section we will represent the boolean function F: {0, 1}ⁿ → {0,1} using a function G: {-1, +1}ⁿ → {-1, +1} with 0 corresponding to −1 and 1 to +1. Thus

This will not affect any of the results, but will make all the calculations much cleaner. Note that a degree-t polynomial approximation of F corresponds to a degree-t polynomial approximation for G and vice-versa.

Each function G: {-1, +1}ⁿ → {-1, +1} can be viewed as a 2ⁿ-dimensional vector in specified by its truth table. Consider the special parity functions χ_s: {-1, +1}ⁿ → {-1, +1}. For each set S ⊂ {1,…, n} of coordinates, let

In other words, χ_s is the parity of the x‘s that correspond to coordinates in the set S. There is a total of 2ⁿ different functions χ_s—one corresponding to each subset S of coordinates. The function χ_φ is the constant function χφ(x) = 1, while the function χ_{{1,…, n}}^(x) outputs the parity of all the coordinates in x.

Similarly to the function G, each function χ_s can also be viewed as a vector in specified by the values it takes on all possible inputs. By abuse of notation we will denote these vectors by χ_s as well. Moreover, for each two sets S₁ ≠ S₂ the vectors and are orthogonal to each other. That is,

Thus we have 2ⁿ orthogonal vectors {χ_s} in , and when properly normalize they yield an orthonormal basis of the space. In other words, each vector in can be uniquely represented as a linear combination of the functions χ_s. In particular, the function G can be uniquely written as:

where the numerical coefficients are given by the inner product

The representation of G given by (2) is called the Fourier transform of G. The transform converts 2ⁿ numbers (the truth table of G) into 2ⁿ numbers (the coefficients (S)), thus preserving all the information about G. Another way to view the representation (2) is as a canonical way of representing G as a multilinear polynomial with real-valued coefficients:

Thus each function G can be uniquely represented as a degree-n multilinear polynomial (a simple fact that can be proved directly, without using the Fourier transform). More importantly, when we view the functions χ_s as vectors, the space H_t of polynomials of degree ≤ t, for any t, is spanned by the functions {χ_s: |S| ≤ t}. This means that to get the best low-degree approximation for G we should simply project it onto H_t, to obtain

Note that is no longer a boolean function, and may take arbitrary real values, even on inputs from {-1, +1}ⁿ. The l₂-error—which we need to bound to prove Lemma 2—is given by

To bound (5) one needs to show that all but a very small fraction of weight in the Fourier representation of G is concentrated on low-degree coefficients. This is where the magic of Linial et al.⁸ happens, and this is where the fact that G is computed by an AC⁰ circuit is being used.

At a very high level, the proofs uses random restrictions. A random restriction ρ assigns the majority of G‘s inputs a random value 0 or 1, while leaving some inputs unset. A tool called the Switching Lemma is then used to show that when a random restriction ρ is applied to and AC⁰ function G, the resulting boolean function G|_ρ is highly likely to be very “simple”—depending on only few of the inputs. On the other hand, a random restriction applied to χ_s where |S| is large, is likely to leave some of the inputs in S unset. Thus if G had a lot of weight on the coefficients (S) with large |S|, G|_ρ would be likely to remain “complicated” even after the application of ρ.

Lemma 2 immediately implies a lower bound for an AC⁰ circuit computing PARITY, since the Fourier representation of PARITY is χ_s(PARITY) = 1 for S = [n], and 0 otherwise. Thus PARITY cannot be approximated well by a polynomial even of degree t = n – 1.

One interesting application of Lemma 2 that is outside of the scope of the present article is for learning AC⁰ functions. Suppose that G is an unknown AC⁰ function of size m, and we are given access to examples of the form (x, G(x)), where x is chosen uniformly from {-1, +1}ⁿ. Then we know that G has a good polynomial approximation of the form (4). Thus to (approximately) learn G, all we have to do is learn the coefficients (S), where S is small. This can be done in time proportional to 2^|S|, immediately yielding an algorithm for learning AC⁰ circuits.

4. Low-Depth Circuits and Polynomials—The Algebraic Connection

It turns out that average-error approximations are not the only useful way in which AC⁰ circuits can be approximated by low-degree polynomials. We have seen in the previous section that the representation of any boolean function by a multilinear polynomial is unique, and thus given an AC⁰ function F we cannot hope to obtain a low-degree polynomial f that agrees with F everywhere. We will see, however, that it is possible to construct a low-degree polynomial f that agrees with F almost everywhere. Specifically, the following statement holds:

LEMMA 3. Let v be any probability distribution on {0, 1}ⁿ. For a circuit of depth d and size m computing a function F, for any s, there is a degree r = (s · log m)^d polynomial f such that P_v[f(x) < F(x)] < 0.82^sm.

Note that Lemma 3 promises an approximating polynomial, again of degree (log m)^O(d), that has a low error against an arbitrary distribution v on inputs. The statement of the lemma is illustrated on Figure 2(a), where the probability according to v of the region of disagreement between f and F is small.

Lemma 3 (or its slight variation) has been proved by Razborov¹¹ and Smolensky¹³ in the late 1980s. The tools for the proof of the lemma have been developed in Valiant and Vazirani.¹⁴ Razborov and Smolensky used the lemma to give stronger lower bounds on bounded depth circuits. Let AC⁰[p] ⊃ AC⁰ be the class of functions computable using a bounded-depth circuit that in addition to the AND and OR gates is allowed to use the MOD_p gate: the gate outputs 1 if and only if the number of 1’s among its inputs is divisible by p. We already know that PARITY ∉ AC⁰, and thus AC⁰[2] AC⁰. Razborov and Smolensky showed that the MOD_p function cannot be computed efficiently by an AC⁰[q] circuit where q ≠ p. In particular, this means that PARITY ∉ AC⁰[3].

The proof of Lemma 3 is combinatorial, and is much simpler than the proof of Lemma 2. In fact, we will be able to present its entire proof below. To obtain the results in Section 5 we will need a slight strengthening of the lemma.

LEMMA 4. Let v be any probability distribution on {0, 1}ⁿ. For a circuit of depth d and size m computing a function F, for any s, there is a degree r = (s · log m)^d polynomial f and a boolean function ε_v computable by a circuit of depth ≤ d + 3 and size O(m²r) such that

P_v[ε_v(x) = 1] < 0.82^sm, and
whenever ε_v(x) = 0, f(x) = F(x).

Thus, not only does the polynomial from Lemma 3 exist, but there is a simple AC⁰ “error circuit” ε_v that given an input can tell us whether f will make an error on the input. The function ε_v is shown on Figure 2(b).

We will now prove Lemma 4. A curious property of the construction is that it gives a probabilistic algorithm for producing the low degree polynomial f that approximates F almost everywhere.

PROOF. (of Lemma 4) We construct the polynomial f by induction on the depth d of F, and show that with high probability f = F. The function ε_v follows from the construction. Note that we do not know anything about the measure v and thus cannot give an explicit construction for f. Instead, we will construct a distribution on polynomials f that succeeds with high probability on any given input. Thus the distribution is expected to have a low error with respect to v, which implies that there is a specific f that has a small error with respect to v.

We will show how to make a step when the output gate in f is an AND gate (see Figure 3). Since the whole construction is symmetric with respect to 0 and 1, the step also holds for an OR gate. Let

where k < m. For convenience, let us assume that k = 2^l is a power of 2. We take a collection of

random subsets of {1, 2,…, k} where each element is included with probability p independently of the others: at least s subsets for each of the p = 2⁻¹, 2⁻²,…, 2^-c = 1/k. Denote the sets by S₁,…, S_t—we ignore empty sets. In addition, we make sure to include {1,…, k} as one of the sets. Let g₁,…, g_k be the approximating polynomials for G₁,…, G_k that are guaranteed by the induction hypothesis applied to G₁,…, G_k with depth d – 1. We set

By the induction assumption, the degrees of g_j are deg_g ≤ (s · log m)^d−1, hence the degree of f is bounded by deg_f ≤ t · deg_g ≤ (s · log m)^d. Next we bound the error P[f ≠ F]. It consists of two terms:

In other words, to make a mistake, either one of the inputs to the final AND gate has to be wrong, or the approximating function for the AND has to make a mistake. We will focus on the second term. The first term is bounded by union bound. We fix a vector of specific values G₁(x),…, G_k(x), and calculate the probability of an error over the possible choices of the random sets S_i.

If all the G_j(x)’s are 1 then the value of F(x) = 1 is calculated correctly with probability 1.

Suppose that F(x) = 0 (and thus at least one of the G_j(x)‘s is 0). Let 1 ≤ z ≤ k be the number of zeros among G₁(x),…, G_k(x), and α be such that 2^α ≤ z < 2^α+1. Our formula will work correctly if one of the sets S_i hits exactly one 0 among the z zeros of G₁(x),…, G_k(x). We will consider only the sets S_i above that are likely to hit exactly one zero. Specifically, let S be a random set as above with p = 2^-α-1. The probability of S hitting exactly one zero is exactly

Hence the probability of the formula being wrong after s such sets is bounded by 0.82^s. Since this is true for any value of x, we can find a collection of sets S_i such that the probability of error as measured by v is at most 0.82^s. By making the same probabilistic argument at every node and applying the union bound, we get that the condition “if the inputs are correct then the output is correct” is satisfied by all nodes except with probability < 0.82^sm. Thus the error of the polynomial is < 0.82^sm.

Finally, if we know the sets S_i at every node, it is easy to check whether there is a mistake by checking that no set contains exactly one 0, thus yielding the depth ≤ (d + 3) function ε_v. The blowup in size is at most O(mr) since at each node we take a disjunction over all the possible pairs of (S_i, G_j S_i) of whether G_j is the only 0 in the set S_i.

5. K-Wise Independent Distributions and Low-Depth Circuits

Next we turn our attention to proving the main result, Theorem 1. The result will give another way in which AC⁰ circuits can be approximated using low-degree polynomials.

THEOREM 1. r(m, d, ε)-independence ε-fools depth-d AC⁰ circuits of size m, where

To prove Theorem 1 we will show:

LEMMA 5. Let s ≥ log m be any parameter. Let F be a boolean function computed by a circuit of depth d and size m. Let μ be an r-independent distribution where

then

where ε(s, d) = 0.82^s · (10m).

Theorem 1 follows from Lemma 5 by taking .

As mentioned in the overview, one can prove that k-wise independence fools a function F by constructing an appropriate approximation of F with low degree polynomials. Bazzi² has given the following equivalent characterization of fooling through polynomial approximations using linear programming duality:

LEMMA 6. Let F: {0, 1}ⁿ → {0, 1} be a boolean function, k ≥ 0 an integer, and ε > 0. Then the following are equivalent:²

Any k-wise independent distribution ε-fools F.
there exist a pair of “sandwiching polynomials” f_l, f_u: {0, 1}ⁿ → R such that:

low degree: deg(f_l), deg(f_u) ≤ k;
sandwiching: f_l ≤ F ≤ f_u on {0, 1}ⁿ;
small error: E[F – f_l], E[f_u – F] ≤ ε, where the expectation is with respect to the uniform distribution on {0, 1}ⁿ.

The sandwiching polynomials are illustrated on Figure 4(a). Since part (1) in Lemma 6 is what we need to prove, we will only be interested in the “(2) (1)” direction of the theorem, which is the “easy” direction, and which we can prove here. Suppose (2) holds, and let μ be any k-wise independent distribution. The polynomial f_l is of degree ≤ k, which as observed in Section 2, implies that E_μ[f_l] = E[f_l]. Thus,

The first inequality uses the sandwiching property, and the last inequality uses the small error property. Similarly,

implying that μ ε-fools F. Thus a problem about fooling AC⁰ circuits is actually another problem on approximating circuits with polynomials!

Our actual proof of Lemma 5 will not produce a pair of sandwiching polynomials, but will “almost” produce such a pair. By the “(1) (2)” direction of Lemma 6 we know that Theorem 1 implies that such a pair must exist for each AC⁰ function.

5.1 Proof of Lemma 5

The proof will combine the two types of polynomial approximations that we discussed in Sections 3 and 4. When we produced the almost-everywhere-correct polynomials in Section 4 we noted that when the polynomial f does disagree with F, the disagreement might be very large. It is still possible to give the following very crude bound on the maximum value ||f||_∞ that |f| may attain on {0,1}ⁿ:

CLAIM 7. In Lemma 4, for s ≥ log m

The claim follows by an easy induction from the proof of Lemma 4. We omit the proof here.

A low-degree polynomial f₀ is a one-sided approximation of a boolean F (see Figure 4(b)) if:

f₀ is a good approximation: ||F – f₀||²₂ is small.
f₀‘s error is one-sided: f₀(x) = 0 whenever F(x) = 0.

If F had such an approximation, then the polynomial f_l:=1 – (1 – f₀)² would be a (lower) sandwiching polynomial for F. f_l still has a low degree, and

is small. This process is illustrated in Figure 4(c).

Thus, being able to produce one-sided approximations (that combine characteristics from both Sections 3 and 4 approximations) would be enough. Unfortunately, we are unable to construct such polynomials. Instead, we show how to modify F just a little bit to obtain a boolean function F‘. The change is slight enough with respect to both μ and the uniform measure so that fooling F‘ and fooling F is almost equivalent. We then show that the modified F‘ does have a one-sided approximation f‘. The properties of F‘ and f‘ are summarized in the following lemma:

LEMMA 8. Let F be computed by a circuit of depth d and size m. Let s₁, s₂ be two parameters with s₁ · log m. Let μ be any probability distribution on {0, 1}ⁿ, and U_{0,1}ⁿ be the uniform distribution on {0,1}ⁿ. Set

Let ε_v be the function from Lemma 4 with s = s₁. Set F’ = F ε_v. Then there is a polynomial f’ of degree r_f ≤ (s₁ · log m)^d + s₂, such that

F ≤ F‘ on {0, 1}ⁿ;
P_μ [F ≠ F‘] < 2 · 0.82^s1m;
||F‘ – f‘||²₂ < , and
f‘ (x) = 0 whenever F‘ (x) = 0.

PROOF IDEA: The proof is illustrated on Figure 5. We start with a polynomial approximation f for F that works “almost everywhere”. By Lemma 4 we know that there is an AC⁰ function ε_v that “flags” the locations of all the errors. We fix F‘= F ε_v so that f = 0 whenever F‘ = 0. The problem is that the error |F‘ – f|² may be large in the area where ε_v = 1 (and thus f behaves poorly). If we could multiply f‘ = f‘ f · (1 – ε_v), we would be done, as this would “kill” all the locations where ε_v is 1. Unfortunately, we cannot do this since ε_v is not a low degree polynomial. Here we use the fact that ε_v is itself an AC⁰ function, and thus can be approximated well by a polynomial by the LinialMansourNisan approximation (Lemma 2). A brief calculation indeed shows that f‘ = f^. (1 – ) for an appropriately chosen satisfies all the conditions.

PROOF. The first property follows from the definition of F‘. The second one follows from Lemma 4 directly, since

Note also that

Let f be the approximating polynomial for F from that lemma, so that F = F‘ = f whenever ε_v = 0, and thus f = 0 whenever F‘ = 0. By Proposition 7 we have

We let be the low degree approximation of ε_v of degree s₂. By Linial et al.⁸ (Lemma 2 above), we have

Let

Then f‘ = 0 whenever F‘ = 0. It remains to estimate ||F‘ – f||²₂:

which completes the proof.

We can now use Lemma 8 to give each AC⁰ function F a lower sandwiching polynomial, at least after a little “massaging”:

LEMMA 9. For every boolean circuit F of depth d and size m and any s ≥ log m, and for any probability distribution μ on {0, 1} there is a boolean function F’ and a polynomial of degree less than

such that

P_μ[F ≠ F‘] < ε (s, d)/2,
F’ F’ on {0, 1}ⁿ,
≤ F’ on {0, 1}ⁿ, and
E [F’ − ] < ε (s, d)/2,

where ε(s, d) = 0.82^s · (10m).

PROOF. Let F‘ be the boolean function and let f‘ be the polynomial from Lemma 8 with s₁ = s and s₂ ≈ 60^d+3 · (log m) ^(d+1)(d+3).S^d(d+3). The first two properties follow directly from the lemma. Set

It is clear that ≤ 1 and moreover = 0 whenever F‘ = 0, hence ≤ F‘. Finally, F‘(x) – f‘_l(x) = 0 when F‘(x) = 0, and is equal to

when F‘(x) = 1, thus

by Lemma 8. To finish the proof we note that the degree of is bounded by

Lemma 9 implies the following:

LEMMA 10. Let s ≥ log m be any parameter. Let F be a boolean function computed by a circuit of depth d and size m. Let μ be an r-independent distribution where

then

where ε(s, d) = 0.82^s · (10m).

PROOF. Let F‘ be the boolean function and let be the polynomial from Lemma 9. The degree of is < r. We use the fact that since μ is r-independent, E_μ[ ] = E[ ] (since k-wise independence fools degree-k polynomials, as discussed above):

The dual inequality to Lemma 10 follows immediately by applying the lemma to the negation = 1 – F of F. We have E_μ[ ] > E[ ] – ε(s, d), and thus

Together, these two statements yield Lemma 5.

Figures

Figure 1. An illustration of the statement of Lemma 2. For convenience, the boolean cube is represented by the real line. The function F (in gray) is an AC⁰ boolean function. The function (in black) is a real-valued low-degree polynomial approximating F well on average.

Figure 2. An illustration of the statement of Lemma 3 (a), and Lemma 4 (b). Note that when the low degree polynomial f does disagree with F we have no good guarantee on the error. This means that f may be a good approximant almost everywhere but not on average.

Figure 3. An illustration of the inductive step where an approximating polynomial f is constructed from the approximating polynomials g₁, g₂,…, g_k.

Figure 4. An illustration of sandwiching polynomials (a) and one-sided approximations (b)(c).

Figure 5. The proof of Lemma 8.

Submit an Article to CACM

CACM welcomes unsolicited submissions on topics of relevance and value to the computing community.

You Just Read

Poly-Logarithmic Independence Fools Bounded-Depth Boolean Circuits

View in the ACM Digital Library

Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and full citation on the first page. Copyright for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, to republish, to post on servers, or to redistribute to lists, requires prior specific permission and/or fee. Request permission to publish from permissions@acm.org or fax (212) 869-0481.

DOI

10.1145/1924421.1924446

April 2011 Issue

Published: April 1, 2011

Vol. 54 No. 4

Pages: 108-115

Table of Contents

Join the Discussion (0)

Become a Member or Sign In to Post a Comment

The Latest from CACM

Explore More

News Apr 18 2024

Keeping AI Out of Elections

Bennie Mols

Artificial Intelligence and Machine Learning

BLOG@CACM Apr 17 2024

Technical Marvels

Herbert Bruderer

Computer History

BLOG@CACM Apr 16 2024

The Value of Data in Embodied Artificial Intelligence

Shaoshan Liu

Artificial Intelligence and Machine Learning

Shape the Future of Computing

ACM encourages its members to take a direct hand in shaping the future of the association. There are more ways than ever to get involved.

Get Involved

Communications of the ACM (CACM) is now a fully Open Access publication.

By opening CACM to the world, we hope to increase engagement among the broader computer science community and encourage non-members to discover the rich resources ACM has to offer.

Learn More