Complexity theory aims at understanding the "hardness" of certain tasks with respect to the number of "basic operations" required to perform it. In the case of arithmetic circuit complexity, the goal is to understand how hard it is to compute a formal polynomial in terms of the number of additions and multiplications required. Several earlier results have shown that it is possible to rearrange basic computational elements in surprising ways to give more efficient algorithms. The main result of this article is along a similar vein.
We present a simulation of any formal polynomial computed by an arithmetic circuit by a shallow circuit of not-much larger size. Roughly, depth corresponds to the time required in a massively parallel computation. This result shows that efficient computations can be speedup to run in depth three, while requiring surprisingly low size.
In addition to the possible usefulness of the shallow simulations, this theorem has implications in computational complexity lower bounds, since this implies that any small improvement in current lower bound approaches would lead to dramatic advances in lower bounds research.
The field of computational complexity broadly attempts to understand the limits of computation under certain resource bounds. The main goal of this field is to understand which problems have efficient solutions and which problems do not. The resources that are often studied are running time of the algorithm, the space used, the randomness used, number of I/O operations, etc.
In the context of time complexity, recall the Boolean class P, containing all decision problems that can be solved by an algorithm that takes polynomial-time in the size of the input. The class P is the class of all problems that we deem efficiently computable with respect to time, and we would like to know if certain algorithmic tasks are in P or not. Some examples are the traveling salesman problem (TSP) (given a weighted graph G and a parameter k, check if there is a tour to visit all vertices of the graph of total cost at most k) or satisfiability (SAT) (where given a Boolean formula, check if there is an assignment to the variables that satisfies the formula). This question is precisely the well-known "P versus NP" question and it is widely believed that there are no efficient algorithms for TSP or SAT. Such conjectures can be interpreted as saying that these computational tasks require many "basic elementary operations." For example, in Boolean circuit complexity, the goal is to understand the minimum number of AND, OR, and NOT gates required to compute a given Boolean function.
The classes of problems of interest in this paper are inherently algebraic such as integer multiplication, matrix multiplication, etc. For many of these problems, many century/millennia old algorithms have been superceded by faster modern algorithms. These faster algorithms show that it is possible to rearrange basic computational elements in surprising ways to give more efficient solutions. Some excellent examples of such algebraic algorithms are the classic Strassen's22 faster matrix multiplication algorithm, and Karatsuba's algorithm11 integer multiplication algorithm. The study of how and when this is possible is the field of arithmetic complexity and it can be seen as an algebraic analog of computational complexity theory.
In this field, which is the focus of this article, the main objects of study are formal polynomials over several variables. Specifically, we wish to understand "how hard" certain polynomials are to compute. The most natural way to compute a formal polynomial from the variables x1, . . ., xn is via a sequence of operations consisting of additions, subtractions, and multiplications. Such computations can be visualized via arithmetic circuits. As seen in Figure 1, we shall allow input wires to a (+) gate to be labeled by scalars to enable computation of arbitrary linear combinations of the inputs. In particular, subtractions may be simulated by labeling the wire with (−1).
We would like to say that "hard" formal polynomials require "many" such operations to compute them. In this regard, two relevant measures of hardness are the circuit size (the total number of arithmetic operations involved) and depth (the maximum length of a path from the input to the output). Our goal is to understand the optimal complexity, in terms of size and depth, of a given formal polynomial family. Note that a circuit could conceivably be implemented in hardware and the depth of the circuit would correspond to the latency or the time needed for the output to appear.
An example of a well-studied formal polynomial family is the determinant:
where Sd refers to the set of all permutations of d symbols. Using the well-known (high school) method for computing the determinant, it can be shown Detd can be computed by poly(d)-sized arithmetic circuits. A very closely related family of formal polynomials is the permanent:
Computing the permanent of a 0/1 matrix is not just NP-hard but can in fact be used to count the number of satisfying assignment for any Boolean formula, and count perfect matching in a bipartite graph, and in some other problems involving counting and estimation in statistics.
Although this family looks very similar to the determinant, it is widely believed that any arithmetic circuit computing Permd must be of size exponential in d. Showing that the permanent cannot be computed by polynomial-size arithmetic circuits is the biggest challenge in the field of arithmetic circuit complexity.
1.1. Importance of arithmetic circuits
To understand hardness of formal polynomial functions, we first need to define a suitable measure of hardness. Intuitively, a polynomial f = x1 . . . xn should be "easy" as this is just a monomial. On the other hand, a polynomial such as should also be "easy," despite having exponentially many monomials, as f' is not much different from f. A measure of hardness, that is, robust with respect to such minor transformations is provided by arithmetic circuits. This model is particularly useful if we would like to understand functions like the determinant and permanent, which are very algebraic in nature. Valiant24 introduced two classes of formal polynomials as algebraic analogs of the Boolean classes P and NP. Analogous to the class P that consists of efficiently computable Boolean functions, the class VP is defined as the class of formal polynomials f(x1, . . . xn) with deg(f) = poly(n) that can be computed by arithmetic circuits of poly(n) size.
Analogous to the class NP that consists of Boolean functions F(x1, . . ., xn) that can be expressed as
for some G ∈ P and m = poly(n), the class VNP consists of formal polynomials f(x1, . . . , xn) that can be expressed as
for some formal polynomial g ∈ VP and m = poly(n).
Valiant24 showed that Detd and Permd are in some sense complete for the classes VP and VNP, respectively. Thus, showing that permanent cannot be computed by polynomial-size arithmetic circuits would separate the algebraic analog of NP from the algebraic analog of P. Further, if NP ⊄ P/poly, then it can be shown that VP ≠ VNP over finite fields. Hence, proving VP ≠ VNP could be thought of as a stepping stone toward proving P ≠ NP.
Another connection to the Boolean world is via the problem of polynomial identity testing (PIT), wherein we are given an arithmetic circuit as input and are required to check if the formal polynomial computed by the circuit is zero or not. Although this algorithmic question has a very natural randomized algorithm, constructing a deterministic algorithm would be major progress.10 Further, as in the Boolean world, derandomizing PIT has very deep connections to proving arithmetic circuit lower bounds via hardness-randomness tradeoffs.
Thus understanding arithmetic circuits could be a first step toward understanding Boolean circuits. Arithmetic circuits also have a lot of structure which, sometimes, have no analogs in the Boolean world. The most important example of this is the possibility of depth reduction for arithmetic circuits.
1.2. The power of shallow circuits
Circuits with low depth correspond to computations which are highly parallel, where edges indicate dependencies in computation. Therefore it is natural to try to minimize the depth of a circuit while allowing the size to increase somewhat. In the case of Boolean circuits, we know that very shallow Boolean circuits are not powerful at all. It is known that constant depth Boolean circuits consisting of ∧ and ∨ gates cannot even compute the parity of n-bits unless they are of exponential size. This important result is now a part of any course or textbook on complexity theory, such as Ref.2 The analogous setting in arithmetic circuits to prove lower bounds for constant depth (even depth of three or four) have been challenging!
Valiant et al.25 showed that if a formal polynomial f of degree d can be computed by a circuit of size s then it can in fact be computed by a circuit of depth O(log d · log s) and size so(1). The contrapositive of this depth reduction is that if we can prove super-polynomial-size lower bounds for O(log2 n) depth circuits, then we can prove super-polynomial-size lower bounds for general circuits.
Pushing this line of investigation of size-depth tradeoffs further, recent work has considered reduction to circuits of even smaller depth (with gates of unbounded fan-in) at the cost of a super-polynomial increase in size. In this direction, the work of Agrawal and Vinay1 and a subsequent strengthening by Koiran14 and Tavenas23 showed that if an n-variate degree d polynomial f has circuits of size s = nO(1) then f can in fact be computed by depth four circuits of size . It is worth noting that the trivial computation as a sum of monomials requires size and a simple nonconstructive argument shows that most formal polynomials require nΩ(d) size. In this light, the increase in size by only is still very useful in the context of lower bounds.
This implies that one merely needs to prove a (good enough) lower bound for depth four circuits in order to prove lower bounds for arbitrary circuits. For example, if we could show that Permd requires depth four circuits of size 2Ω(d3/4) then this would immediately imply that Permd requires general arithmetic circuits of size 2Ω(d1/4) to compute it. Indeed, motivated by this, recent lower bounds for depth four circuits have come very close to the threshold required to separate VP and VNP.
The following is a summary of the known depth reduction results:
1.3. Lower bounds for shallow circuits
Depth three circuits. Progress in proving lower bounds for arithmetic circuits has been admittedly slow. This is generally considered to be one of the most challenging problems in computer science. The difficulty of the problem has led researchers to focus on natural subclasses of arithmetic circuits. Bounded depth circuits being one such natural subclass has received a lot of attention. The class of depth three arithmetic circuits, also denoted as ΣΠΣ circuits, have been intensely investigated as it is the shallowest nontrivial subclass. Such a circuit C, as shown in Figure 2, computes a formal polynomial of the form
where each ℓij(x) is a linear polynomial over the input variables. ΣΠΣ circuits arise naturally in the investigation of the complexity of polynomial multiplication and matrix multiplication. Moreover, the optimal formula/circuit for some well-known families of formal polynomials are in fact depth three circuits. In particular, the best known circuit for computing Permd is known as Ryser's formula19:
which is a depth three circuit of size O(d2 · 2d). Ryser's formula is another example of a nontrivial speedup of algebraic computation.
Quite a few lower bounds have been obtained for restrictions of depth three circuits. Nisan and Wigderson18 studied depth three circuits under the restriction of homogeneity. A homogeneous circuit is one where the degree of all intermediate computations are bounded by the degree of the output polynomial. Nisan and Wigderson18 showed that over any field
No entries found