Abstract
Despite being a quintessential example of a hard problem, the quest for finding fast algorithms for deciding satisfiability of propositional formulas has occupied computer scientists both in theory and in practice. In this article, we survey recent progress on designing algorithms with strong refutation guarantees for smoothed instances of the k-SAT problem. Smoothed instances are formed by slight random perturbations of arbitrary instances, and their study is a way to bridge the gap between worst-case and average-case models of problem instances. Our methods yield new algorithms for smoothed k-SAT instances with guarantees that match those for the significantly simpler and well-studied model of random formulas. Additionally, they have led to a novel and unexpected line of attack on some longstanding extremal combinatorial problems in graph theory and coding theory. As an example, we will discuss the resolution of a 2008 conjecture of Feige on the existence of short cycles in hypergraphs.
1. Introduction
The famous SAT problem asks to determine if a given propositional formula is satisfiable. That is, can we set the formula’s variables to 0 (False) or 1 (True) in a way so that the formula evaluates to 1 (True). In this article, we will focus on the k-SAT problem where the propositional formula is further restricted to be in the k-CNF form, that is, logical AND of a collection of k-clauses, each of which is a logical OR of at most k literals (variables or their logical negations). For example, (x1∨¬x2∨x3)∧(x2∨x4∨¬x5) is a 3-CNF formula in variables x1,x2,…,x5 where ∨,∧, and ¬ denote the logical AND, OR and NOT operations, respectively. Given a k-CNF formula, we are interested in either finding a satisfying truth assignment, if it exists, or a “refutation”—a short, easily-checkable proof that the formula is unsatisfiable. Despite its simplicity, k-SAT is phenomenally expressive and models a long list of important discrete optimization problems. A decades-long quest has thus focused on designing algorithms for k-SAT in both theory and practice.
In this article, we will focus on finding refutations—“obviously verifiable” polynomial size (that is, short) contradictions that confirm unsatisfiability of a k-SAT formula. For any formula, we can simply a tabulate each of the 2n possible truth assignments x together with a clause violated by x. This is an obviously verifiable refutation but clearly too long—it’s exponential in size. On the other hand, if we get lucky and our formula happens to contain two 1-clauses (x1)∧(¬x1), then it is manifestly unsatisfiable and the two 1-clauses serve as an easily verifiable and short certificate of unsatisfiability. Of course, it’s unrealistic for such clauses to magically occur in interesting inputs. But we can often infer additional clauses that must also be satisfied if the input formula is satisfied and hope that such an obviously verifiable short contradiction arises in the inferred clauses. Such clause learning (via mechanical resolution rules) for deriving new clauses form an integral part of practical SAT solvers.
In his famous 1972 paper,18 Karp proved that ascertaining satisfiability or finding refutations for 3-SAT formulas is NP-hard. Thus, finding a fast algorithm for 3-SAT that succeeds (even approximately) on all possible input is likely hard. One might naturally expect finding refutations to get easier as the number of clauses increases (more clauses means more possibilities for contradictions to manifest), and so perhaps one might hope that denser instances get easier? No such luck! As it turns out, unless a widely believed, stronger variant of the P≠NP conjecture fails, there are no polynomial time algorithms for refuting k-SAT formulas unless they have essentially the maximum possible ≈nk (out of ≈nk possible) clauses. In fact, even substantially beating brute-force search and finding sub-exponential (for example, 2√n) time algorithms is ruled out for formulas with ≈nk−1 clauses.14
Despite the grim picture presented by these hardness results, the extraordinary modeling power of k-SAT has motivated a concerted research effort for finding fast heuristics for k-SAT. On the practical side, new algorithmic ideas along with advances in software engineering have made modern SAT solvers13 a powerful and indispensable tool with applications to solving practical instances of optimization problems in planning, model checking, and verification of software systems. By encoding the task of finding counter-examples to mathematical conjectures into SAT formulas, SAT solvers have even helped resolve longstanding mathematical conjectures.6 On the theoretical side, algorithms research has focused on more careful modeling of input instances to escape worst-case hardness under minimal assumptions. Such beyond worst-case input models for hard discrete optimization problems such as k-SAT now form a vibrant area11 of research in algorithm design.
1.1 The Smoothed k-SAT Model
In 2007, Feige proposed9 his smoothed k-SAT model as a way to circumvent the all-pervasive hardness of k-SAT. He was inspired by a groundbreaking work of Spielman and Teng on smoothed analysis of the simplex algorithm. The simplex algorithm for linear programming, introduced by Dantzig in 1947, presented an uncomfortable disconnect between theoretical predictions and practical performance—here was a fast practical algorithm that also provably needed exponential time in the worst-case! In 2001, Spielman and Teng convincingly resolved23 this tension and showed that the simplex method runs in polynomial time on smoothed inputs—an input obtained by adding a small random perturbation to an arbitrary instance. Such a perturbation, of the sort one might reasonably expect practical instances to naturally possess, is enough to remove all hardness in even carefully crafted instances.
Feige’s model involves an analogous smoothening of a worst case k-SAT formula by randomly perturbing each literal (that is, changing an unnegated variable to negated and vice-versa) in each clause with some small probability, say 0.01, independently. For example, given two clauses (x1∨¬x2∨x3)∧(x2∨x4∨¬x5), we imagine tossing 6 independent coins with bias 0.01, one for each literal in each of the two clauses. The smoothed version of the first clause, will have, for example, x1 negated if the first coin lands on heads, x2 unnegated if the 2nd coin lands on heads, and so on.
If the input formula ϕ has ≫Cn clauses in n variables for some large enough constant C, then the resulting smoothed formula is unsatisfiable with high probability over the random perturbation. Feige thus asked if the task of finding refutations for smoothed k-SAT formulas gets significantly easier when compared to worst-case formulas. Equivalently, given our discussion above, do ≪nk−1 clause-smoothed k-SAT formulas admit efficient refutation algorithms?
Prior works2,8,22 showed that the answer is indeed yes for the random k-SAT model—a fully smoothed model where the negation patterns and the clause structure, that is, the variables appearing in the clauses (that are worst-case in smoothed k-SAT) are chosen independently at random. However, those algorithms strongly exploit the abundant randomness in the choice of variables appearing in the clauses.
In this article, we will survey recent developments1,16,17 on a new class of algorithms, based on the eigenvalues of certain specialized Kikuchi matrices (introduced earlier24 for statistical inference and to simplify algorithms for random k-SAT22 for even k), that yield optimal (modulo certain hardness conjectures) algorithms for smoothed k-SAT. As a result, these new algorithms succeed in refuting smoothed k-SAT formulas with m≳nk/2logn clauses, that is, ≪nk−1, in polynomial time and significantly beat brute-force search if m≳n1+∊. In fact, our guarantees for smoothed k-SAT match the best-known (and conjectured optimal) results for the significantly simpler and restricted setting of randomk-SAT formulas, provide quantitative bounds on the number of clauses that every truth assignment must violate, and extend far beyond k-SAT to handle all logical constraint satisfaction problems.
1.2 Spectral Methods for Combinatorics
While the theoretical advances in algorithms for k-SAT haven’t yet influenced practical SAT-solving, they already have some surprising applications to long open problems in other areas of mathematics. These include resolving Feige’s conjecture on small even covers (cycles) in hypergraphs,16,17 making progress on the decades-long quest for optimal bounds for locally decodable4 and locally correctable19 error-correcting codes, and problems7 in additive number theory that generalize the famous Szemeredi’s theorem.
The principle behind such applications is analogous to how SAT solvers helped resolve mathematical conjectures by encoding the search for a proof into a SAT formula. Our theoretical analog strongly exploits the newfound ability to tackle k-SAT formulas with a worst-case clause structure. In this article, we will discuss an application of this method to proving Feige’s conjecture on small cycles in hypergraphs. Surprisingly, proving this conjecture will let us go full circle to show even better refutations for smoothed k-SAT.
Short Cycles in Graphs.
Feige’s conjecture10 is a generalization of a basic result about short cycles in graphs. Recall that a graph (aka network), is simply a collection of pairs, called edges (modeling pairwise associations), on n nodes. A cycle in such a network is a sequence of nodes v1→v2→…→vℓ→v1 that starts and ends at the same node such that every consecutive pair has an edge between them. In his famous 1978 book,5 mathematician Béla Bollobás conjectured that every graph with n nodes where every node, on average, has d>2 edges (this quantity is called the average degree), must have a cycle of length at most ≈logd−1(n). When d=2, the network can be a single giant cycle on all n vertices that clearly has no cycle of length ≤n−1. For any d>2, however, the conjecture implies that we cannot even avoid a cycle of length O(logn)—an exponentially smaller bound than n—and thus signals a phase transition in the extremal length of the smallest cycle as the average degree d crosses 2.
Bollobas’s conjecture is an example of a result in extremal combinatorics. Such results uncover a truth that holds for all (thus extremal) mathematical objects. Here, it says that no matter how we might build a graph on n nodes, so long as it has average degree d>2, we cannot avoid introducing a short cycle. In their elegant 2002 paper,3 mathematicians Alon, Hoory and Linial confirmed this conjecture.
Short Cycles in Hypergraphs.
Feige’s conjecture asks a similar question about short cycles in hypergraphs. A k-uniform hypergraph is a collection of subsets of size k on n nodes, called hyperedges, that model associations between a larger number of nodes (instead of 2 in graphs). A 2-uniform hypergraph is simply a graph. In order to pose Feige’s question, we will identify a key property of cycles in graphs and use it to motivate a generalized notion of cycles in hypergraphs. Observe that every vertex appears in either two or zero edges in any cycle in a graph. In particular, a cycle is an even subgraph—a subset of edges on which, every vertex appears an even integer number of times. Even subgraphs naturally generalize to hypergraphs. We will define a hypergraph cycle or even covers to be a collection C1,C2,…,Cℓ of hyperedges such that every node of the hypergraph is included in an even number of Ci’s (that is, an even subhypergraph). (See Figure 1a and 1b.)
This definition may look odd (or perhaps a little too even?), at first. A cycle in a graph has an appealing combinatorial structure of a loop. Hypergraph cycles seem to lack a combinatorial meaning. Why did Feige (and why should we) care about it? Feige’s motivation for studying short cycles actually stemmed from to finding refutations for k-SAT formulas (see next section). Here, we outline a different source of motivations that comes from deep connections to the theory of error correcting codes because hypergraph cycles naturally relate to solving systems of linear equations.
To see why, let’s associate a variable to every node of a graph and let’s think of each edge as specifying a linear equation modulo 2. Thus, the edge 1∼2 between nodes 1 and 2 relates to the equation x1+x2=b mod 2 where b∈{0,1}. A cycle in the graph then naturally corresponds to a subset of 2-sparse (that is, each equation has only two non-zero coefficients) equations that is linearly dependent—that is, if you want this subset of equations to be satisfied, then the right hand side b of at least one of the equations is already fixed (that is, cannot be chosen independently) once you fix the choice of the right hand sides for all the others. As a simple example, consider the graph on 3 nodes with edges 1∼2, 2∼3 and 3∼1. Suppose you knew that some x satisfies the two equations corresponding to the first two edges x1+x2=0 mod 2 and x2+x3 is 1 mod 2. Then, adding the left hand sides of the two equations gives x1+2x2+x3=1 mod 2 which is equivalent to x1+x3=1 mod 2 since 2x2=0 mod 2, regardless of the value of x2. Thus, the right hand side of third equation is determined/dependent and cannot be chosen independently of the first two equations. A cycle in a k-uniform hypergraph similarly corresponds to a linearly dependent subset of k-sparse (that is, each equation has k non-zero coefficients) equations corresponding to each hyperedge.
The length of the smallest cycle thus equals the size of the smallest linearly dependent subset of equations in a given system. Understanding the size of such a set turns out to have a whole gamut of applications, especially in designing error-correcting codes. Error correcting codes (or just codes, in short) are a systematic method of adding redundancy to a message so that, when the message transmitted across a noisy channel and incurs errors, one can still decode it uniquely thanks to the redundancy. The systematic methods or codes naturally involve adding “parity checks”, that is, right hand sides of an appropriately chosen set of linear equations evaluated at the message. In such codes, the length of the smallest linearly dependent subset of equations naturally corresponds to distance—a crucial quantity that controls the number of errors that can be corrected. The smallest linear dependencies in k-sparse equations turns out to be equivalent to understanding the best possible distance (and thus, the largest possible rate of errors that can be tolerated) by an important class of codes called low density parity check codes introduced by Gallager in 1960s with numerous theoretical and practical applications.
Feige’s Conjecture and the Resolution.
As in the case of graphs, we are interested in the extremal trade-off between average degree (that is, average number of hyperedges containing a node) and the length of the smallest hypergraph cycle in a k-uniform hypergraph. Given the connection to linear dependencies above and the basic fact that every collection of m=n+1 equations in n variables are linearly dependent, whenever the average degree d=mk/n>k, then the hypergraph must have a (rather long) cycle of length n+1. Making an analogy to graphs, one might expect that if d≫k, the hypergraph must have a O(logn)-length cycle, but this turns out to be false! Mathematicians Assaf Naor and Jacques Verstraete in 2006 showed21 that one needs (and this is enough!) the average degree d≳nk/2−1 in order for every hypergraph to have a O(logn)-length cycle. For k=2, this matches a coarse version of the bound for graphs but for k≥3 suggests a new regime between d≈1 and d≈√n that has no analog for graphs. What happens when, for example, d≈n0.25? Feige conjectured a precise behavior for this regime:
Every k-uniform hypergraph on n nodes with average degree ≈(n/ℓ)k/2−1 has an ℓlogn-length cycle.
Feige’s conjecture (up to some logn factors in d) was motivated by the hypothesis (and is, in fact, equivalent to it) that there is no better construction of hypergraphs avoiding short cycles than simply choosing a random hypergraph. It is thus analogous to the famous 1947 theorem of Shannon (the birthplace of modern coding theory!) that random error correcting codes are “best” in a precise sense.
Despite being a foundational statement about hypergraphs, the conjecture remained largely open with only some partial progress for a special case by Alon and Feige in 2009. Now, by invoking the new algorithms for solving smoothed k-SAT we can essentially completely resolve it — up to one additional logn factor in the degree bound! This was established first by the authors with additional logn factors which were later17 trimmed down to a single logn factor.
Every k-uniform hypergraph on n nodes with average degree ≈(n/ℓ)k/2−1logn has an ℓlogn-length cycle.
As we will explain later in this article, the proof of this theorem makes a new connection between the success of our spectral approach for smoothed k-SAT and existence of short cycles in hypergraphs. It forms the first (of a growing list of) application spectral refutations via Kikuchi matrices in combinatorics.
2. A New Spectral Approach
Our approach for finding refutations for smoothed k-SAT formulas relies on continuous time algorithms based on spectral methods—methods that use eigenvalues and eigenvectors of matrices built from the input. This is in contrast to the largely discrete algorithmic toolkit (such as resolution refutations and their generalization) in modern practical SAT solvers. In fact, it has been known for more than 20 years that even random k-SAT formulas with ≪nk−1 clauses do not admit efficient resolution refutations—a natural formalization of combinatorial refutations.
The spectral approach for refuting k-SAT formulas was conceived15 by Goerdt and Krivilevich back in 2001. A decade and half of work led to spectral methods with conjectured-optimal guarantees for refuting random k-SAT in 2017. These spectral methods, however, are rather brittle and strongly rely on the randomness in the variables appearing in each clause. For smoothed k-SAT, such methods provably fail since the variables appearing in the clauses are completely arbitrary. In this article, we will present a new class of spectral methods, based on Kikuchi matrices, which, when combined with combinatorial pre-processing, provide robust methods for refutation that significantly simplify the results for random k-SAT and extend to smoothed k-SAT with no loss in performance.
From k-SAT to Degree k Polynomials.
To bring in spectral methods, we will make a simple but conceptually important translation between refuting a k-SAT formula and finding certificates of upper bounds on the maximum value of a degree k polynomial. For this purpose, it will be more convenient to view truth assignments as +1 (True) and −1 (False). Given a k-SAT formula ϕ with m clauses, let Φ:{−1,1}n→N map truth assignments x∈{−1,1}n to the number of clauses satisfies by x. Then, Φ(x) is clearly the sum of m functions ΦC, one for each clause C in ϕ where ΦC(x)=1 if and only if x satisfies clause C. Since ΦC depends only on k{±1}-variables, it is a degree k polynomial. For example, k=3 and C=(x1∨x2∨x3):
ΦC(x)=78+18(x1+x2+x3−x1x2−x2x3−x3x1+x1x2x3) (1)To refute the k-SAT formula ϕ, we will find an easily-checkable certificate of the fact Φ(x)=∑CΦC(x)<m for all x. In fact, we will certify a stronger bound of Φ(x)≤0.99m, that is, every x must violate not just 1 but in fact 1% of the clauses.
Observe that ΦC is a sum of 8 (in general, 2k) monomials, each of degree ≤3 (k, more generally). A standard idea, going back to early 2000s, is to certify bounds on 8 different polynomials, obtained by taking one out of 8 terms corresponding to each C. We can then obtain a bound on Φ(x) by adding all the 8 quantities. For each such polynomial obtained from a smoothed 3-SAT formula, with a little more work that we omit here, we can also assume that the coefficients of each monomial is an independent, uniform {±1}. We thus focus on strong refutation of semirandom homogeneous polynomials Ψ(x) of the form:
Ψ(x)=∑C∈HbC∏i∈Cxi, (2)where H, the instance hypergraph, is simply the collection of m different sets C⊆[n] of size k (corresponding to each original clause) and bC∈{±1} are chosen uniformly and independently for each C∈H. Here, strong refutation involves certifying that Ψ(x)≤∊m for a sufficiently tiny ∊>0.
2.1 From Quadratic Polynomials to Matrices
Let us show how spectral methods show up by starting with the simplest setting of k=2. Then, each C∈H is a set of size 2, or simply a pair {i,j}⊆[n], and Ψ(x) from (2) is a degree 2 polynomial in x. The idea is to view such a degree 2 polynomial as a quadratic form.
For an n×n matrix A, its quadratic form on a vector v equals v⊤Av=〈v,Av〉=∑i,j≤nvivjA(i,j). This expression is a homogeneous quadratic polynomial in v. Indeed, every quadratic polynomial is a quadratic form of an associated matrix and vice-versa. For our Ψ, let the n×n matrix A be defined by:
A(i,j)={b{i,j}ifC={i,j}∈H0otherwise.Then, for any x∈{−1,1}n, x⊤Ax=∑i,jxixjbi,j=2Ψ(x).
A basic result in linear algebra allows upper-bounding any quadratic form of A as:
v⊤Av≤∥v∥22∥A∥2,where ∥v∥2=√∑iv2i is the length of the vector v and ∥A∥2 is the “spectral norm” or the largest singular value of A. Since x has {±1}-coordinates and thus length √n, we obtain that Ψ(x)≤n∥A∥2.
This bound on Ψ(x) is easily verifiable. Given Ψ (obtained easily from the k-SAT formula ϕ), we form the matrix A and use a linear time algorithm (called power iteration) to obtain good estimates on the spectral norm ∥A∥2.
To certify Ψ(x)≤∊m, we need to check that ∥A∥2≤∊m/n. A is an example of a random matrix since its entries b{i,j} are uniformly and independently distributed in {±1}. There is a well-developed theory for understanding the typical value of ∥A∥2 for such random matrices that allows us to conclude that ∥A∥2≲√Δmaxlogn where Δmax is the maximum number of non-zero entries in any row of the matrix A. If the pairs C={i,j} are “equidistributed” that is, any variable i participates in roughly the same number of pairs, then we would expect Δmax≈Δavg≈m/n where Δavg is the average number of non-zero entries in a row of A. Thus, ∥A∥2≲√mlogn/n which is ≤∊m/n if m≳nlogn.
What if the set of pairs H is not regular and some i is “over-represented” in the set of pairs C? While we omit formal details, it turns out that one can use an elegant reweighting trick (discovered in the work of17) on the matrix A that effectively allows us to assume regularity if Δavg≫1.
2.2 Generalizing to Quartic Polynomials
The case of odd k turns out to be technically challenging so let us skip k=3 and generalize the above approach when Ψ(x) is of degree k=4 (that is, the case of 4-SAT). So, Ψ(x) is not quadratic in x. We will now view it as a quadratic form in (n2) variables each corresponding to quadratic monomials xixj in the original assignment x. Let us write x∘2 for the vector in R(n2) indexed by pairs {i,j} with entry at {i1,i2} given by xi1xi2. Define the (n2)×(n2) matrix A:
A({i1,i2},{j1,j2}):={bi1,i2,j1,j2ifC={i1,i2,j1,j2}∈H0otherwise.Then, as before, we can observe that:
(x∘2)⊤A(x∘2)=∑{i,j},{k,ℓ}xixjxkxℓA({i,j},{k,ℓ})=6Ψ(x)The factor 6 comes from the fact that there (42)=6 different ways that a set C of size 4 can be broken into pairs of pairs each of which arises as a term in the quadratic form above.
We can now write
Ψ(x)≤∥x∘2∥22∥A∥2=(n2)∥A∥2.And a similar appeal to results in random matrix theory tells us that with high probability ∥A∥2≲√Δmaxlog(n2)≲√Δmaxlogn where Δmax is the maximum number of non-zero entries in any row of A. Equivalently, Δmax is the maximum number of 4-clauses that a pair {i,j} participates in. When all pairs behave roughly similarly, we will have Δmax≈Δavg≲m(n2), in which case
Ψ(x)≤(n2)√m(n2)logn≤∊mwith high probability if m≳n2logn.
Early proofs of such facts used different tools from random matrix theory and worked for random 4-SAT by utilizing that H is a random collection of 4-sets in that case. Our approach here explicitly reveals that only an equi-distribution (that is, Δmax≈Δavg) property of H is required for the success of this approach. This allows us, via a similar reweighting trick (that succeeds if Δavg≫1) discussed above, to obtain a result that works for arbitrary (worst-case) hypergraphs.
Let’s finish this part by noting our quantitative bounds. For k=2, our refutation succeeded when m≳nlogn. For k=4, we instead needed m≳n2logn. Indeed, for arbitrary even k, a similar argument yields a bound of m≳nk/2logn—a significant improvement over the Ω(nk) clauses required for refuting an unsatisfiable k-SAT formula in the worst-case, showing us the power of the spectral approach.
2.3 Beyond Basic Spectral Refutations
A smoothed k-SAT formula is unsatisfiable with high probability whenever it has m≳n clauses. But our spectral refutations above require m≳n2logn—a bound higher by a factor ≈n (and nk/2−1 for arbitrary even k). This is because our approach fails whenever the average number of entries in a row of A, that is, Δavg=m/(n2), is ≪1. The question of whether there are non-trivial refutations for k-SAT formulas when m≪nk/2 (now called the spectral threshold) remained open for more than a decade and half. In the same time, researchers found evidence, in the form of restricted lower bounds,20 that there may be no polynomial time refutation algorithm for m≪nk/2. This, nevertheless, left open the possibility of significantly beating brute-force search below this threshold. This possibility was realized for randomk-SAT in 2017. We will now discuss a significantly simpler (described essentially in full below!) spectral approach that succeeds even for smoothed k-SAT.
We will continue to work with k=4. As before, we will write Ψ(x) as a quadratic form but instead of the natural matrices we discussed above, we will use Kikuchi matrices that we next introduce. First, though, a piece of notation: for sets S,T⊆[n], we let S⊕T=(S∪T)(S∩T) denote the symmetric difference of S and T.
For any r∈N, the level r-Kikuchi matrix for Ψ is a (nr)×(nr) matrix with rows and columns indexed by sets S,T⊆[n] of size r and entries given by:A(S,T)={bCifS⊕T=C∈H0otherwise.
Observe that for r=2, the above Kikuchi matrices specialize to the (n2)×(n2) matrix we saw in the previous subsection. Let’s see why Ψ(x) is a quadratic form of A.
For any assignment x∈{±1}n, denote by x∘r the (nr)-dimensional vector with coordinates indexed by sets S⊆[n] of size r and S-th entry given by xS=∏i∈Sxi. Then, for D=(42)(nr−−42) we have:
(x∘r)⊤A(x∘r)=∑S,TxSxTA(S,T)=∑C∈HbC∑S,T:S⊕T=CxC=DΨ(x).Here, we used that since S⊕T=C, for any x∈{±1}n, xS·xT=xS∪TS∩Tx2S∩T=xC as x2i=1 for every i. The last equality holds true because the number of pairs (S,T) such that S,T are r-size sets and S⊕T=C is exactly D for any set C of size 4. Observe how our notational trick of switching to {±1}-valued truth assignments paid off here.
Given this simple observation, we can now again construct the spectral upper bound Ψ(x)≤∥x∘r∥22∥A∥2=(nr)∥A∥2. Furthermore, it turns out that powerful tools of random matrix theory still allow us to conclude as before that
∥A∥2≲√Δmaxlog(nr)=√Δmaxrlogn. (3)The Kikuchi superpower: Density Increment.
Why might this upper bound be better? The meat is in the density increment. As r grows, the number of rows in A grow. But the number of non-zero entries in A grow even faster, giving us a net increase in Δavg. Indeed, let m=n2logn/ℓ for some parameter ℓ∈N. Then, since each C∈H contributes D non-zero entries, Δavg=mD/(nr)≈(n2logn/ℓ)(r2/n2)≈r2logn/ℓ. In particular, even when ℓ≫logn (and thus we have m=n2logn/ℓ≪n2 clauses) choosing r large enough still allows us to obtain a Δavg≫1.
Surprisingly, the rest of the proof idea is more or less the same as before! Let us assume, as we did at first in both the previous subsections, that all rows of A have roughly equal number of non-zero entries. Such a condition holds true if H is a random collection of sets of size 4. Then, Δmax≲Δavg≈r2logn/ℓ. Plugging this back in (3) gives
Ψ(x)≤(nr)D∥A∥2≲mΔavg√Δavgrlogn≤∊mif Δavg≳√Δavgrlogn/∊ or Δavg≳rlogn∊2logn. Since Δavg=r2logn/ℓ, this condition holds if r≥ℓ/∊2.
Furthermore, as in the previous two subsections, we can use a variant of our reweighting trick to generalize this argument to arbitrary H without any further assumptions. To verify this bound algorithmically (that is, to “check” our refutation), we need to construct the matrix A and compute its spectral norm. This requires a run time proportional to the dimension of the matrix which scales as ≈nr. So, all in all, we obtain a roughly nℓ/∊2 time algorithm to certify that Ψ(x)≤∊m whenever m≥n2/ℓ for any ℓ∈N. When m≈n1+δ for some small δ>0, that is, even slightly superlinear, the runtime of our algorithm is strictly sub-exponential (specifically ≈2n1−δ) and thus asymptotically beats brute-force search.
Handling odd k: We described our approach so far for even k. The case of odd k turns out to be a little more involved. This has been true for spectral algorithms ever since the earliest spectral algorithms for the problem. The polynomial time case (for example, when m≳n1.5logn for 3-SAT analogous to m≳n2logn for 4-SAT) were first found in a work of Abascal, Guruswami, Kothari in 2021. The full trade-off required introducing the correct generalizations of the Kikuchi matrices that we have described above. The analysis of the spectral norms of such matrices requires more effort and some additional combinatorial ideas.
We will not formalize these ideas here but note the following eventual result that we derive as a consequence:
For any k∈N and ℓ∈N, given a semi-random homogeneous degree k polynomial Ψ(x) with m≥n(n/ℓ)k/2−1logn non-zero coefficients, there is a 2O(ℓlogn/∊2) time spectral algorithm that certifies Ψ(x)≤∊m. Consequently, for any k, we can refute smoothed k-SAT formulas with m clauses also in time 2O(ℓlogn/∊2).
3. Proving Feige’s Conjecture
Let us now see how our spectral algorithms for smoothed k-SAT provides a resolution for Feige’s conjecture. Our approach can be thought of as a theoretical equivalent of encoding the search for a proof into (un)-satisfiability of a SAT formula and then running a practical SAT solver. Given a hypergraph H, we will build a SAT formula Ψrandom that will be satisfiable if H does not have a short cycle. We will then prove that Ψrandom is in fact unsatisfiable to complete our proof. Of course, instead of using the computer to find such a refutation, we will “find” them (that is, argue their existence) analytically by appealing to our spectral algorithms.
Let us now describe this our argument in more detail. We are given an arbitrary k-uniform hypergraph H on n nodes and average degree d≈(n/ℓ)k/2−1logn. Our goal is to show that H must have an ℓlogn-length cycle. Starting from H, we will define a family of homogeneous degree k polynomials:
Ψsat=∑C∈HbCxC.Observe that Ψsat is clearly satisfiable. Indeed, if xi=1 for every i then Ψsat(x)=|H|, the maximum possible value.
Our key claim below will argue, using the analysis of our spectral algorithm from above, that if H has no short cycle then Ψsat must in fact be unsatisfiable:
If H has no ≈(ℓlogn)/∊2 length cycle, then, maxx∈{±1}nΨsat(x)≤∊|H|.
We thus immediately hit a contradiction unless H has a ≈ℓlogn length cycle.
Let us now discuss why Lemma 5 must hold for even k. For r∈N, we let A be the Kikuchi matrix for the polynomial Ψsat that we built in the previous section:
Asat(S,T)={1ifS⊕T∈H0otherwise.Then, we have: Ψsat(x)≤(nr)∥Asat∥2.
We will now argue a rather odd-looking fact. Consider the polynomial Ψ defined below for arbitrary bC∈{±1}:
Ψb=∑C∈HbCxC,We also let Ψrandom be the special case when bCs are chosen uniformly at random and independently. Notice that Ψrandom has the same form as the polynomial Ψ we analyzed in the previous section. Let Ab be the Kikuchi matrix built from Ψb:
Ab(S,T)={bCifS⊕T=C∈H0otherwise.We will argue that if H had no log(nr)≈ℓlogn-length cycle, then ∥Ab∥2≈∥Asat∥2no matter what the value of bC’s are. Now, from the previous section, we know that (nr)∥Ab∥2≤∊|H| for random b thus for every x:
Ψsat(x)≤(nr)∥Asat∥2≤(nr)∥Ab∥2≤∊|H|.This claim can appear strange. How can it be that ∥Ab∥2 does not depend on the bC’s at all? In a sense, our proof reveals how short cycles in H are “necessary” for ∥Ab∥2 to be as small as it is in the previous section!
Trace Moments and Spectral Norms.
To relate ∥Asat∥2 and ∥Arandom∥2 and to bring in the cycles in H, we will utilize a classical connection between ∥B∥2 and the so-called trace moments of a matrix B that, in turn, are related to a certain combinatorial count of walks on B.
For any N×N symmetric matrix B, let ∥B∥2=σ1≥σ2≥⋯σN≥0 be its N singular values placed in descending order. The trace tr(B) is simply the sum of the diagonal elements of B. For any even 2t∈N, a classical observation in linear algebra says that the trace of the 2t-th power of B equals the sum of 2t-th powers of its singular values:
tr(B2t)=σ2t1+σ2t2+⋯+σ2tN.This is helpful because we can now write:
∥B∥2t2=σ2t1≤tr(B2t)≤n∑i=1σ2ti≤Nσ2t1=N∥B∥2t2That is, the 2t-th power of ∥B∥2 equals tr(B2t) up to a factor N. By taking 2t=logN and taking 1/2t-th powers on both sides and recalling that N1/logN→1 as N→∞ gives:
∥B∥2≤tr(B2t)1/2t=N1/logN∥B∥2≈∥B∥2.Thus, for 2t≈logN, tr(B2t)1/2t is a faithful approximation to ∥B∥2.
Relating ∥Asat∥2 and ∥Arandom∥2 via Trace Moments.
Using the above connection, we will now focus on arguing that tr(A2tsat)=tr(A2trandom) for 2t=logN=log(nr)≈rlogn. This would give us ∥Asat∥2≈∥Arandom∥2.
We now recall another classical observation from basic linear algebra that relates trace moments of a matrix B to a certain combinatorial count of “walks” on B. For A (standing for Asat or Ab), we thus have:
tr(A2t)=∑S1,S2,…,S2tA(S1,S2)A(S2,S3)⋯A(S2t,S1). (4)That is, tr(A2t) is the sum over all sequences of 2t row indices (that is, subsets of [n] of size r) of product of the entries of A on consecutive elements of the sequence.
Every entry of A is either 0 or ±1 and for each non-zero entry A(Si,Si+1), Si⊕Si+1=C for some C∈H. Thus, any 2t-sequence (S1,S2,…,S2t) that contributes a non-zero (and thus exactly ∏2ti=1bCi in tr(A2tb)) value to the sum above must correspond to a 2t-tuple (C1,C2,…,C2t) of hyperedges from H, one for each entry A(Si,Si+1).
More is true about such a (C1,C2,…,C2t), as we next demonstrate in the crucial observation below. Let 1Ci∈{0,1}n be the 0-1 indicator of the set Ci⊆[n]. That is, 1Ci(j)=1 if and only if j∈Ci.
Any (C1,C2,…,C2t) corresponding to a non-zero term in (4) satisfies ∑2ti=11Ci=0 mod 2.
This crucial observation is actually quite simple to prove. We know that 1Si+1Si+1=1Ci mod 2 for every i—since Si⊕Si+1=Ci. If we add up all the left hand sides we get a sum over 1Si’s where every 1Si appears exactly twice (since Si occurs in exactly two entries A(Si−1,Si) and A(Si,Si+1)). Thus the left hand side (and thus also the right hand side) must add up to the 0 vector modulo 2.
Next, notice that the j-th entry ∑2ti=11Ci(j)=0 mod 2 if and only if j occurs in an even number of Ci’s. Thus, the above observation says that the (multi)-set {C1,C2,…,C2t} is a cycle or an even cover. This appears exciting since we have a direct relationship between tr(A2tsat) and cycles in H!
There is a crucial snag though—the same C could recur multiple times in (C1,C2,…,C2t). Indeed, if Ci=C for every i or more generally, every Ci appeared in pairs, then, of course every element j∈[n] will occur in an even number of Ci’s, for the trivial reason that the Ci’s themselves occur in pairs. Let’s call such 2t-tuples trivial cycles—that is, the Ci’s occur in pairs and thus do not relate to cycles in H.
Now for our endgame. For every trivial cycle, since Ci’s appear in pairs, the quantity ∏2ti=1bCi has even number of copies of bCi for every bCi. Since b2Ci=1, this quantity must equal 1 regardless of the Ci’s! Thus, no matter what the bCi’s, all non-zero terms contribute exactly 1. In particular, tr(A2tsat) (where all bCi’s equal 1) must equal tr(A2tb) regardless of bCis.
This finishes our argument, but it’s perhaps helpful to summarize it: we related ∥A∥2 (for both A=Asat and A=Ab) to tr(A2t). We then related tr(A2t) to a sum over 2t-tuples (S1,S2,…,S2t). The non-zero terms in this sum correspond to ∏2ti=1bCi for (C1,C2,…,C2t) for Ci∈H such that ∑1Ci=0 mod 2—this step crucially uses the structure of the Kikuchi matrices. If H has no 2t=log(nr) length cycles, then in every nonzero term in the sum the Ci’s must occur in pairs, in which case we observe that ∏2ti=1bCi=1 and is thus independent of what the bCi’s themselves are.
4. Even Smaller Refutations
In the final act of this article, we will come full circle to show how the purely combinatorial Feige’s conjecture yields a surprising corollary for refutations for k-SAT. We discussed a spectral algorithm that finds refutations for smoothed k-SAT whenever the number of clauses m≳nk/2logn. Improving on this spectral threshold even for the substantially specialized setting of random 3-SAT has been open (with accumulating evidence that this maybe impossible) ever since the 2004 work8 that obtained first such result.
In a surprising twist from 2006, Feige, Kim and Ofek proved12 that for random 3-SAT formulas with m≳n1.4 clauses (significantly short of the spectral threshold of ≈n1.5) admit short, polynomial size refutations with high probability. That is, there exists a polynomial size certificate, based on a clever combination of spectral and combinatorial ideas, which, if given, can easily help convince us of its unsatisfiability. But despite around two decades of efforts, we do not know polynomial time algorithms to find such a certificate. The FKO result forces us to grapple with the possibility that there may be a marked difference between existence of short certificates for NP-hard problems and efficient algorithms to find them. No such gap is known (or expected!) for worst-case k-SAT, making this a truly average-case phenomenon. And, no such gap is known for any other discrete optimization problem, even in the average case. Indeed, ever since its discovery, FKO has been a one-of-a-kind result with an aura of mystery around it.
Are the mysterious FKO certificates a quirk of the random 3-SAT model? Or should we expect analogs in more general instances? We will now sketch how our results from the previous two sections allow us a surprising corollary:
With high probability, smoothed 3-SAT formulas with m≳n1.4logn clauses admit an easily-checkable polynomial size refutation. More generally, a similar result holds for smoothed k-SAT formulas with nk/2−δk clauses where δk>0 depends only on k.
That is, the FKO results extends without any quantitative change to smoothed 3-SAT formulas. The existence of short cycles in hypergraphs plays a major role in obtaining this corollary. Indeed, this was also a principle motivation for Feige’s conjecture back in 2008. At the time, since Conjecture 1 was not known, FKO’s proof used a sophisticated application of the second moment method from probability theory with rather complicated calculations. Given Theorem 2, our new certificate and its analysis will be simple.
The idea for construction such refutations is quite simple. Our spectral refutation worked by splitting the polynomial Φ into 8 different polynomials of degree ≤3 and then refuting each polynomial via our spectral algorithm. As before, we will do the splitting and use the spectral algorithm for all terms of degree ≤2, that is, for all except for the homogeneous degree 3 polynomial corresponding to the last term in (1) for which, we will use a “combinatorial” method. Importantly, for the degree ≤2 terms, our polyomial time spectral algorithm from Section 2.1 only needs m≳n/∊2logn (instead of ≈n1.5) to certify a bound ≤∊.
The basic observation behind the combinatorial method is rather simple. Let H be the 3-uniform hypergraph of monomials appearing in Ψ. Let {C1,C2,…,Ct} be a cycle in H and let Ψcycle=∑ti=1bCixCi be the “fragment” of Ψ that only keeps the monomials corresponding to the cycle. Then, if ∏ti=1bCi=−1 (which happens with probability 1/2 over the choice of bCi’s), then, we claim that Ψcycle(x)≤t−1 for every x. That is, every x must in fact be at least 1 short of the maximum value of t on such a fragment. Suppose not and say for some x, Ψcycle(x)=t. Then, xCi=bCi for every 1≤i≤t. Thus, ∏ti=1xCi=∏ti=1bCi=−1. But on the other hand, since {C1,C2,…,Ct} is a cycle, every j occurs in an even number of Ci’s and thus, ∏ti=1xCi=∏nj=1x(even)j=1. This is a contradiction!
Here’s how this basic observation helps. Suppose H has m≈n1.5logn/√ℓ hyperedges. Then, we know from Theorem 2 that H contains a ≈ℓlogn cycle. We can then remove the hyperedges in this cycle and repeatedly find ℓlogn length cycles in the residual hypergraph. This process gives us a “cycle partition” of 99% of hyperedges of H into cycles of length ≈ℓlogn. From our argument above, for about 1/2 of such cycles, the product of the corresponding bCi’s will turn out to be −1. Let’s call such cycles violated. Thus, can write:
Ψ=∑violatedcyclesinpartitionΨcycle(x)+ΨremainingFor each violated cycle, every x must “lose” at least 1 on the maximum possible value of the polynomial. So, we know that at any x, Ψ must be short of its maximum value by at least the number of violated cycles in the partition. So, Ψ(x)≤m−O(mℓlogn)=(1−O(1ℓlogn))m. This is a significantly weaker bound than that of our spectral algorithm (∊m), but is still non-trivial. It is also efficiently verifiable given a list of violated cycles (of size at most O(n1.5), so polynomial size).
The corollary follows by combining this combinatorial certificate with the spectral bound on the degree ≤2 parts of Φ. The precise parameters are obtained by optimizing ℓ above but we will omit it here.
5. Conclusion
In this article, we surveyed a new class of spectral algorithms based on Kikuchi matrices to find refutations, that is, easily verifiable proofs of unsatisfiability, for smoothed k-SAT formulas. The guarantees we obtained were as strong as the best-known (and conjectured optimal) ones for the substantially simpler random k-SAT formulas and substantially surpass the best-possible (assuming standard hardness conjectures) running times for worst-case k-SAT formulas at every clause density. The approach generalizes to yield similar results for all logical constraint satisfaction problems. We also saw the resolution of the 2008 conjecture of Feige on short cycles in hypergraphs as an example application. And as a consequence, we saw how to extend the one-of-a-kind Feige-Kim-Ofek result from random k-SAT formulas to all smoothed k-SAT formulas. Taken together, the results show that, per the current state-of-the-art, smoothed k-SAT is no harder than the substantially simpler random k-SAT formulas for both refutation algorithms and existence of short certificates.
The Kikuchi matrix method, the method of proving results by finding spectral refutations for a related propositional formula, coming out this line of work appears to be a promising new attack on problems in combinatorics and coding theory. It is a pleasing theoretical analog of the powerful approach for resolving mathematical problems via practical SAT solvers—a decidedly “computer science” approach to solve problems in mathematics. A few more applications, including making progress on some decades-old problems in the theory of local error correcting codes,4,19 are now already around and we anticipate more such results in the near future. (See Figure 2a and 2b.)
Join the Discussion (0)
Become a Member or Sign In to Post a Comment