Graph sparsification is the approximation of an arbitrary graph by a sparse graph.

We explain what it means for one graph to be a spectral approximation of another and review the development of algorithms for spectral sparsification. In addition to being an interesting concept, spectral sparsification has been an important tool in the design of nearly linear-time algorithms for solving systems of linear equations in symmetric, diagonally dominant matrices. The fast solution of these linear systems has already led to breakthrough results in combinatorial optimization, including a faster algorithm for finding approximate maximum flows and minimum cuts in an undirected network.

### 1. Introduction

A sparse graph is one whose number of edges is reasonably viewed as being proportional to its number of vertices. In contrast, the complete graph on *n* vertices, with about *n*^{2}/2 edges, is the paradigmatic dense graph. Sparse graphs are often easier to handle than dense ones. Most graph algorithms run faster, sometimes by orders of magnitude, when there are fewer edges, and the graph itself can be stored more compactly. By approximating a dense graph of interest by a suitable sparse one, one can save time and space.

We will work with weighted graphs, where the weights might represent capacities, conductance, similarity, or just coefficients in a linear system. In a sparse graph, all of the edges can be important for a graph’s structure. In a tree, for example, each edge provides the only path between its endpoints. Not so in a dense graph, where some edges will serve similar functions. A collection of low-weight edges connecting two clusters of vertices in a graph might be approximable by a single high-weight edge connecting vertices representative of those clusters. Sparsification can be viewed as a procedure for finding a set of representative edges and weighting them appropriately.

What exactly do we mean by sparse? We would certainly consider a graph sparse if its average degree were less than 10, and we would probably consider a graph sparse if it had one billion vertices and average degree one hundred. We formalize the notion of sparsity in the usual analysis-of-algorithms way by considering infinite families of graphs, and proclaiming sparse those whose average degrees are bounded by some constant, or perhaps by a polynomial in the logarithm of their number of vertices.

One may at first think that sparsification is unnecessary, as common wisdom holds that all large real-world graphs are sparse. While this may be true of natural graphs such as social networks, it is not true of the graphs that arise inside algorithms. Many algorithms involve the construction of graphs that are dense, even when solving problems on graphs that are sparse. Moreover, the common wisdom may be an artifact of the difficulty of storing and manipulating a large dense graph. Improvements in sparsification may one day ameliorate these difficulties.

The use of sparse graphs to approximate dense ones is not unique to algorithm design. In a parallel computer, for instance, information needs to be able to flow from any processor to any other. Hardwiring all those pairwise connections would be physically difficult, so a sparse graph simulating the connectivity properties of the complete graph is needed. The hypercube graph plays this role in the CM5, built by Thinking Machines.^{25} Intuitively, the hypercube has no “bottlenecks.” Formally, the (weighted) hypercube is a good spectral sparsifier for the complete graph defined on its nodes. We have shown that *every* graph has a very sparse spectral approximation, with constant average degree.

### 2. Notions of Similarity

A few conventions: we specify a weighted graph by a 3-tuple, *G* = (*V*, *E*, *w*), with vertex set *V* = {1, …, *n*}, edge set *E* ⊆ {(*u*, *v*) | *u*, *v* ∈ *V*}, and weights *w*_{(u,v)} > 0 for each (*u*, *v*) ∈ *E*. All graphs will be undirected and weighted, unless otherwise stated. We sometimes express a graph simply by *G* = (*V*, *w*), as *E* can be defined implicitly by setting *w*_{(u,v)} = 0 for all (*u*, *v*) ∉ *E*. We will always write *n* for the number of vertices in a graph and *m* for the number of edges. When measuring the similarity between two graphs, we will always assume that they have the same set of vertices.

The notion of cut similarity of graphs was first considered by Benczúr and Karger^{8} as part of an effort to develop fast algorithms for the minimum cut and maximum flow problems. In these problems, one is interested in the sum of the weights of edges that are cut when one divides the vertices of a graph into two pieces. Two weighted graphs on the same vertex set are cut-similar if the sum of the weights of the edges cut is approximately the same in each such division.

To write this symbolically, we first observe that a division of the vertices into two parts can be specified by identifying
the subset of vertices in one part. For a weighted graph *G* = (*V*, *w*) and a subset of vertices *S* ⊂ *V*, we define

We say that *G* = (*V*, *w*) and
= (*V*,
) are *σ-cut similar* if

for all *S* ⊂ *V*. Surprisingly, every graph is cut-similar to a graph with average degree *O*(log *n*), and that graph can be computed in polylogarithmic time.

THEOREM 1 (BENCZÚR-KARGER). *For all* ε > 0, *every G* = (*V*, *E*, *w*) *has a* (1 + ε)-*cut similar graph*
= (*V*,
,
) *such that*
⊆ *E and* |
| = *O*(*n* log *n*/ε^{2}). *Moreover
can be computed in O*(*m* log^{3} *n* + *m* log *n*/ε^{2}) *time*.

The sizes of cuts in a graph tell us a lot about its structure—for starters, the weighted degrees of vertices are given by cuts of size |*S*| = 1. Most ways of defining a cluster of vertices in a graph involve comparing the number of edges in the cut defined by the set of vertices to the number of edges internal to that set.

Motivated by problems in numerical linear algebra and spectral graph theory, Spielman and Teng^{34} introduced a notion of *spectral similarity* for two graphs. We will first describe it as a generalization of cut similarity.

Given a weighted graph *G* = (*V*, *w*), we define the Laplacian quadratic form of *G* to be the function *Q*_{G} from
^{V} to
given by

If *S* is a set of vertices and *x* is the characteristic vector of *S* (1 inside *S* and 0 outside), then it is easy to see that

We say two graphs *G* = (*V*, *w*) and
= (*V*,
) are *σ-spectrally similar* if

Thus, cut similarity can be viewed as the special case of spectral similarity in which we only consider vectors *x* that take values in {0, 1}.

It is possible to construct graphs that have very similar cuts, but which are highly dissimilar from the spectral perspective; for instance, the *n*-vertex path is 2-cut similar but only *n*-spectrally similar to the *n*-vertex cycle. Although spectral similarity is strictly stronger than cut similarity, it is easier to check if two graphs are spectrally similar. In particular, one can estimate the spectral similarity of two graphs to precision ε in time polynomial in *n* and log(1/ε), but it is NP-hard to approximately compute the cut-similarity of two graphs.

Graphs that are spectrally similar share many algebraic properties. For example, the effective resistance distances between all pairs of vertices are similar in spectrally similar graphs. The effective resistance distance is defined by viewing each edge in a graph as a resistor: an edge of weight *w* becomes a resistor of resistance 1/*w*. The entire graph is then viewed as a resistive circuit, and the effective resistance between two vertices is just the electrical resistance in the network between them. It equals the potential difference induced between the vertices when a unit current is injected at one and extracted at the other. In terms of the Laplacian quadratic form, the effective resistance between vertices *u* and *v* may be written as

an identity which follows from the well-known energy minimizing property of electrical flows.

The Laplacian quadratic form provides a natural way to solve regression problems on graphs. In these problems, one is told the values of *x* on some subset of the nodes *S* and is asked to infer the values on the remaining nodes. One approach to solving these problems is to view the known values as voltages that have been fixed, and the values at the other nodes as the induced voltages. That is, one seeks the vector *x* that minimizes *Q*_{G}(*x*) while agreeing with the known values.^{36} One can show that if two graphs are spectrally similar, then the solutions to all such regression problems on the graphs will be similar as well.

The problems of regression and computing effective resistances are special cases of the problem that motivated the definition of spectral similarity: the solution of linear equations in Laplacian matrices. The Laplacian quadratic form can be written as

where *L*_{G} is the Laplacian matrix of *G*. The Laplacian matrix of a graph *G* = (*V*, *w*) is defined by

The problem of solving systems of linear equations in Laplacian matrices arises in many areas of computational science and optimization. In fact, the spectral similarity measure is *identical* to the concept of relative condition number in numerical linear algebra. If two graphs are spectrally similar, then through the technique of preconditioning one can use solutions to linear equations in the Laplacian matrix of one graph to solve systems of linear equations in the Laplacian of the other.

**2.3. Spectral similarity of matrices**

For two symmetric matrices *A* and *B* in
^{n×n}, we write *A*
*B* to indicate that

We say *A* and *B* are *σ-spectrally similar* if

We have named this relation spectral similarity because it implies that the two matrices have similar eigenvalues. The Courant-Fisher Theorem tells us that

Thus, if λ_{1}, …, λ_{n} are the eigenvalues of *A* and
_{1},…,
_{n} are the eigenvalues of *B*, then for all *i*, λ*i*/σ ≤
_{i} ≤ σ · λ*i*.

Using this notation, we can now write inequality (1) as

That is, two graphs are *σ*-spectrally similar if their Laplacian matrices are. We remark that the Laplacian matrix of a graph is (i) *symmetric*, that is, *L*_{G} = *L*^{T}_{G}, (ii) *positive semi-definite*, that is, all eigenvalues of *L*_{G} are non-negative, and (iii) *(weakly) diagonally dominant*, that is, for all *i*, *L*_{G}(*i*, *i*)≥ ∑_{j≠i}|*L*_{G}(*i*, *j*)|. From consideration of the Laplacian quadratic form, it is easy to verify that if *G* is connected, then the null space of *L*_{G} is just the span of the all 1’s vector. Thus all connected graphs have the same Laplacian null space and exactly one zero eigenvalue.

It is worth mentioning an interesting alternative to cut- and spectral-similarity. If one assigns a length to every edge in a graph, then these lengths induce a shortest-path distance between every pair of vertices. We say that two different graphs on the same vertex set are *σ-distance similar* if the distance between each pair of vertices in one graph is within a multiplicative factor of *σ* of the distance between the corresponding pair of vertices in the other graph. Formally, if *G* and
are the graphs and if dist_{G}(*u*, *v*) is the distance between vertices *u* and *v* in *G*, then *G* and
are *σ-distance similar* if for all *u*, *v* ∈ *V*,

When
is a subgraph of *G*, the inequality dist_{G}(*u*, *v*) ≤ dist_{
}(*u*, *v*) is automatically satisfied. Peleg and Ullman^{29} defined a *t*-spanner of a graph *G* to be a subgraph such that for all *u*, *v* ∈ *V*,

They were interested in finding sparse *t*-spanners. It has been shown^{4} that every weighed graph has a (2*t* + 1)-spanner with *O*(*n*^{1 + 1/t}) edges. The most extreme form of a sparse spanner is the *low stretch spanning tree*, which has only *n* 1 edges, but which only approximately preserves distances on average,^{1} up to polylogarithmic distortion.

### 3. Finding Sparse Substitutes

A (*σ*, *d*)-*spectral sparsifier* of a graph *G* is a graph
satisfying

Since spectral similarity implies that the total edge weight of a graph is preserved, the spectral sparsifier can only have fewer edges than *G* if those edges have larger weights.

In this section, we begin by discussing sparsifiers of the complete graph, which have been known for some time. We then describe a sequence of ever-stronger existence theorems, culminating in the statement that any graph *G* has a (1 + ε, 4/ε^{2})-spectral sparsifier for every ε ∈ (0, 1).

**3.1. Cliques have constant-degree sparsifers**

To warm up, let us first examine the quality of the hypercube as a spectral sparsifier of the complete graph. Assume for convenience that *n* is a power of two. Let *G* be the complete graph on *n* vertices. All the non-zero eigenvalues of *L*_{G} equal *n*, so for every unit vector *x* orthogonal to the all-1s vector,

The non-zero eigenvalues of the Laplacian of the hypercube with *n* vertices in which every edge has weight 1 are (2, …, 2 log *n*). Let *H* be this hypercube, but with edge weights *n*/(2
). The non-zero eigenvalues of the Laplacian of *H* are then

which implies that for every unit vector *x* orthogonal to the all-1s vector,

Thus, *H* is a (
, log *n*/2)-spectral sparsifier of the *n*-clique.

In fact, the complete graph has much better spectral sparsifiers. Consider the Ramanujan graphs,^{26, 27} which are *d*-regular graphs all of whose non-zero Laplacian eigenvalues lie between *d* 2
and *d* + 2
. If we let
be a Ramanujan graph with edge weight *n/d*, then for every unit vector *x* orthogonal to the all-1s vector,

Thus,
is a ((1 2
/*d*)^{−1}, *d*/2)-spectral sparsifier of the complete graph *G*.

**3.2. Sampling and decomposition: every graph has a good sparsifier**

Ramanujan graphs are members of the family of *expander graphs*. These are sparse graphs in which every subset of vertices has a significant fraction of its edges connecting to vertices outside the subset (see below for details). As with the hypercube and Ramanujan graphs, we can show that every expander can be rescaled to be a good spectral sparsifier of the complete graph.

It is well known that random sparse graphs are usually good expanders (see Bollobás^{9} and Friedman^{17}). Therefore, one can obtain a good spectral sparsifier of the complete graph by random sampling. Spielman and Teng^{34} took this view one step further to show that graph sampling can be used to obtain a good spectral sparsifier for every graph. Their construction was strongly motivated by the work of Benczúr and Karger^{8} and Achlioptas and McSherry.^{2}

The sampling procedure involves assigning a probability *p*_{u,v} to each edge (*u*, *v*) ∈ *G*, and then selecting edge (*u*, *v*) to be in the graph
with probability *p*_{u,v}. When edge (*u*, *v*) is chosen to be in the graph, we multiply its weight by 1/*p*_{u,v}.

This procedure guarantees that

The key step in this approach is to determine the sampling probability *p*_{u,v} for each edge; there is a tension between choosing small *p*_{u,v} to generate a sparser
and choosing larger *p*_{u,v} to more accurately approximate *G*. Spielman and Teng recognized that some edges are more essential than others, and used a graph decomposition process to implicitly identify these edges and set the sampling probabilities accordingly.

**Conductance and graph decomposition**. For an unweighted graph *G* = (*V*, *E*) and disjoint subsets *S*, *T* ⊂ *V*, we let *E*(*S*, *T*) denote the set of edges in *E* connecting one vertex of *S* with one vertex of *T*. We define Vol (*S*) = ∑_{i∈S}*d*_{i} and observe that Vol (*V*) = 2|*E*|. We define the *conductance* of a set *S* of vertices to be

and we define

The *normalized Laplacian* matrix of a graph (see [14]), is defined to be

where *D* is the diagonal matrix whose *u*-th entry is *d*_{u}. The discrete version^{3, 31} of Cheeger’s^{11} inequality (Theorem 2) relates the second eigenvalue of the normalized Laplacian to the conductance of a graph.

THEOREM 2 (DISCRETE CHEEGER INEQUALITY).

We define a *decomposition* of *G* to be a partition of *V* into sets (*A*_{1}, …, *A*_{k}), for some *k*. The *boundary* of a decomposition (*A*_{1}, …, *A*_{k}) is then the set of edges between different vertex sets in the partition:

We say that the decomposition is a φ*-decomposition* if
for all *i*, where *G*(*A*_{i}) denotes the subgraph induced on *A*_{i}. It is a λ-*spectral decomposition* if the smallest non-zero normalized Laplacian eigenvalue of *G*(*A*_{i}) is at least λ, for all *i*. By Cheeger’s inequality, every φ-decomposition is a (φ^{2}/2)-spectral decomposition.

The following two theorems (see Spielman and Teng^{34}) together imply that for all ε, every graph has a((1 + ε), *O*(ε^{−2} log^{7} *n*))-spectral sparsifier.

THEOREM 3 (SPECTRAL DECOMPOSITION). *Every G has an* Ω (log^{−2}*n*)-*spectral decomposition with boundary size at most* |*E*|/2.

THEOREM 4 (SAMPLING WORKS FOR EXPANDERS). *Suppose* ε ∈ (0, 1/2) *and G* = (*V*, *E*) *is an unweighted graph with smallest non-zero normalized Laplacian eigenvalue at least* λ. *Let*
= (*V*,
,
) *be a graph obtained by sampling the edges of G with probabilities*

*where*

*and setting weights*
_{(e)} = 1/*p*_{e} *for e* ∈
. *Then, with probability at least* 1/2,
*is a* (1 + ε)-*spectral approximation of G, and the average degree of
is O*((log *n*)^{2} (ελ)^{−2})

To construct a spectral sparsifier of an arbitrary unweighted graph, we first apply Theorem 3 to find a Ω(1/log^{2}*n*)-spectral decomposition of the graph in which the boundary has at most half the edges. We then sparsify each of the components by random sampling, and we sparsify the graph formed by the boundary edges recursively. Adding the sparsifiers obtained yields a sparsifier for the original graph, as desired.

**3.3. Sampling by effective resistance**

By using effective resistances to define the edge sampling probabilities *p*_{e}, Spielman and Srivastava^{32} proved that every graph has a ((1 + ε), *O*(log *n*/ε^{2}))-spectral sparsifier. These spectral sparsifiers have a similar number of edges to the cut sparsifiers described in Theorem 1, and many fewer edges than those produced by Spielman and Teng^{34}. We define *R*_{e}, the effective resistance of an edge *e*, to be the effective resistance between its endpoints. It is well-known that *R*_{e} is proportional to the commute time between the end-vertices of *e*,^{10} and is equal to the probability that *e* appears in a random spanning tree of *G*. Spielman and Srivastava proved that sampling with edge probability *p*_{e} proportional to *w*_{e}*R*_{e} is the “right” distribution for creating spectral sparsifiers.

THEOREM 5 (SAMPLING BY EFFECTIVE RESISTANCE). *For any weighted graph G* = (*V*, *E*, *w*) *and* 0 < ε ≤ 1, *let
be the graph obtained by the following random process:*

Set

q= 8nlogn/ε^{2}. Choose a random edge ofGwith probabilityp_{e}proportional tow_{e}R_{e}, and addeto the edge set of with weightw_{e}/qp_{e}. Takeqsamples independently with replacement, summing weights if an edge is chosen more than once.

*Then with probability at least* 1/2,
*is a* (1 + ε)-*spectral approximation of G*.

The proof of Theorem 5 is matrix-analytic. We begin by observing that the Laplacian matrix of *G* can be expressed as a sum of outer products of vectors:

where *e*_{u} denotes the elementary unit vector in direction *u*. Since the edges in
are a subset of the edges in *G*, its Laplacian may also be expressed as a (differently weighted) sum of the same outer products:

Suppose the non-zero eigenvalue-and-eigenvector pairs of *L*_{G} are (Λ_{1},*u*_{1}),…,(Λ_{n−1},*u*_{n−1}). Then we can write

Let *L*_{G}^{+} be the Moore-Penrose Pseudoinverse of *L*_{G}, that is,

Then
is the projection matrix onto the span of {*u*_{i}}.

The key to the analysis of Theorem 5, and the improved construction of Section 3.4, is the observation that

where *L*_{G}^{+/2} is the square root of *L*_{G}^{+}. We will show that the sampling procedure described is likely to satisfy this latter condition. To this end, define random variables {*s*_{e}}_{e∈E} to capture the outcome of the sampling procedure:

Then

and **E**[*s*_{(u,v)}] = 1 for all (*u*, *v*) ∈ *E*, whence **E**[*L*_{
}] = *L*_{G}. We now write:

where the *Y*_{i} are i.i.d. random vectors sampled from the distribution

Notice that **E**[*YY*^{T}] = Π so the expectation of each term is correct. To analyze the sum, we apply the following concentration theorem for sums of independent rank one matrices due to Rudelson.^{30}

THEOREM 6 (RUDELSON). *Let p be a probability distribution over* Ω ⊂
^{d} *such that sup*_{y∈Ω}||*y*||_{2}≤*M* *and* ||**E**||[*yy*^{T}]|| ≤ 1. *Let y*_{1}, …, *y*_{q} *be independent samples drawn from p with replacement. Then*,

To optimize the application of Rudelson’s theorem, we choose the {*p*_{(u,v)}} so that all possible values of *Y* have the same norm, that is,

for some fixed γ, for all (*u*, *v*) ∈ *E*. An elementary calculation reveals that the value of γ that causes the sum of the probabilities to be one is
. Thus if we sample according to probabilities

then we can take *M* =
in Theorem 6. This tells us that *q* = *O*(*n* log *n*/ε^{2}) samples are sufficient to obtain a (1 + ε)-spectral approximation with high probability, from which Theorem 5 follows.

As stated earlier, the probabilities we have chosen for sampling edges have a natural meaning:

where

is the effective resistance between the vertices *u* and *v*.

**3.4. Twice-Ramanujan sparsifiers**

In a nutshell, Spielman and Srivastava first reduced the sparsification of *G* = (*V*, *E*, *w*) to the following algebraic problem: compute scalars {*s*_{u,v}≤0|(*u, v*)∈*E*} such that
={*e*|*s*_{u, v} > 0} has small cardinality, and

They then applied sampling, based on effective resistances, to generate the {*s*_{u,v}} and
.

Batson et al.^{7} gave a deterministic polynomial-time algorithm for computing {*s*_{e}} and
, and obtained the following theorem, which is essentially the best possible result for spectral sparsification.

THEOREM 7 (BATSON-SPIELMAN-SRIVASTAVA). *For every d* > 1, *every undirected, weighted n-node graph G* = (*V*, *E*, *w*) *has a*

*In particular, G has a* ((1 + 2ε), 4/ε^{2})-*spectral sparsifier, for every* 0 < ε < 1.

At the heart of their construction is the following purely linear algebraic theorem, which may be shown to imply Theorem 7 by an argument similar to that in Section 3.3.

THEOREM 8. *Suppose d* > 1 *and v*_{1}, …, *v*_{m} ∈
^{n} *satisfy*
, *where I*_{n} *is the n* × *n identity matrix. Then, there exist scalars s*_{i} ≥ 0 *with* |{*i*: *s*_{i} ≠ 0}| ≤ *dn such that*

The proof for Theorem 8 builds the sum ∑_{i}*s*_{i}*v*_{i}*v*_{i}^{T} iteratively, by adding one vector at a time. For
= *dn*, it chooses a sequence of vectors *π*(1), …, *π*(
) and weights *s*_{π(1)},…,*s*_{π(
)}, which in turn defines a sequence of matrices 0 = *A*_{0}, …, *A*_{
}, where
. Observe that
and we always have *A*_{t}
*A*_{t−1}. The goal is to control the eigenvalues of *A*_{t} at each step and guarantee that they grow with *t* in a *steady* manner, so that the final matrix *A*_{
} = *A*_{dn} has all eigenvalues within a constant factor
of each other and is hence a good approximation to the identity.

Batson, Spielman, and Srivastava use two “barrier” potential functions to guide their choice of *π*(*i*) and *s*_{π(i)} and ensure steady progress. Specifically, for *u*, *l* ∈
, and *A* a symmetric matrix with eigenvalues λ_{1}, …, λ_{n}, they define

When *l* · *I*_{n}
*A*
*u* · *I*_{n}, small values of these potentials indicate that the eigenvalues of *A* do not cluster near *u* or *l*. This turns out to be a sufficient induction hypothesis to sustain the following iterative process:

(1) Begin by setting the lower barrier *l*, to −*n* and the upper barrier, *u* to *n*. It can be checked that both potentials are bounded by 1. (2) At each step, increase the upper barrier *u* by a fixed constant *δ*_{u} and the lower barrier *l* by another fixed constant *δ*_{l} < *δ*_{u}. It can then be shown that as long as the potentials remain bounded, there must exist at every time *t* a choice of a vector *v*_{π(i)} and a weight *s*_{π(i)} so that the addition of
to *A*_{t−1} and the increments *l* → *l* + *δ*_{l} and *u* → *u* + *δ*_{u} *do not increase either potential* and keep all the eigenvalues λ_{i}(*A*_{t}) between the barriers. Iterating the above process ensures steady growth of all the eigenvalues and yields Theorem 8.

In a recent work, de Carli Silva et al.^{15} extended spectral sparsification to the sums of positive semidefinite matrices that have arbitrary rank. They proved the following theorem.

THEOREM 9. *Let B*_{1}, …, *B*_{m} *be symmetric (or Hermitian) positive semidefinite n* × *n matrices and*
. *Then for any* ε ∈ (0, 1), *there exist nonnegative s*_{1}, …, *s*_{m}, *at most O*(*n*/ε^{2}) *of which are nonzero, such that*

*Moreover*, {*s*_{1}, …, *s*_{m}} *can be computed in O*(*mn*^{3}/ε^{2}) *time*.

Another extension is subgraph spectral sparsification,^{22} in which one is given a union of two graphs *G* and *W* and an integer κ, and asked to find a κ-edge graph *W*_{κ} such that *G* + *W*_{κ} is a good spectral sparsifier of *G* + *W*. When combined with the best-known construction of low-stretch spanning trees,^{1} this provides nearly optimal *ultra-sparsifiers*.

THEOREM 10 (KOLLA, MAKARYCHEV, SABERI, TENG). *For each positive integer* κ, *every n-vertex graph has an O(
)-spectral approximation with at most* (*n* 1 + κ)-*edges*.

Ultra-sparsifiers have so few edges that they have a large number of vertices of degree 1 or 2. They are a key component of the algorithms for solving linear equations described in Section 4.1.

### 4. Algorithmic Applications

**4.1. Numerical algorithms: Laplacian systems**

One of the most fundamental computational problems is that of solving a system of linear equations. One is given an *n* × *n* matrix *A* and an *n*-dimensional vector *b*, and is asked to find a vector *x* such that *Ax* = *b*. It is well known that every linear system can be solved by the classic Gaussian elimination method in polynomial time. However, Gaussian elimination usually takes super-linear or even super-quadratic time in the number of non-zero entries of *A*, making its use impractical for large matrices.

In many real-world applications, the matrix *A* is sparse and one only requires an approximate solution to the linear system. For example, given a precision parameter ε, we may be asked to produce an
such that || *A*
− *b* ||_{2} ≤ *∈* ||*b* ||_{2}. For sparse positive semi-definite linear systems the fastest general purpose algorithm is the Conjugate Gradient (CG). It essentially solves *Ax* = *b* by multiplying a sequence of vectors by *A*. As the multiplication of a vector by *A* takes time proportional to the number of non-zero entries in *A*, CG can run quickly when *A* is sparse The number of matrix-vector products performed by CG depends on the *condition number* κ(*A*) of *A*, the ratio of its largest eigenvalue to its smallest eigenvalue. It is well-known in numerical analysis^{19} that it is sufficient to compute *O*(
) matrix-vector products to find a solution of accuracy ε.

Preconditioning is the strategy of finding a relatively easily invertible matrix *B* which is *σ*-spectrally similar to *A*, and solving the related system *B*^{−1}*Ax* = *B*^{−1}*b*. In each iteration, the preconditioned CG algorithm solves a linear system in *B* and performs a matrix-vector product in *A*, and only *O*(
) iterations are required. If it is easy to solve systems of linear equations in *B*, then the cost of each iteration is small and this algorithm will run quickly.

In 1990, Vaidya brought graph theory into the picture. Using preconditioners consisting of a maximum spanning tree of a graph plus a small number of carefully chosen edges, Vaidya obtained an *O*(*m*^{1.75} log(1/ε))-time algorithm for solving linear systems in Laplacian matrices with *m* non-zero entries. The exponent was still too large to be practical, but the idea was powerful. Spielman and Teng^{33} were able to enhance Vaidya’s approach with spectral sparsifiers and low-stretch spanning trees to obtain the first nearly linear time algorithm for solving Laplacian linear systems.

THEOREM 11 (SPIELMAN-TENG). *Linear systems in a graph Laplacian L*_{G} *can be solved to precision* ε *in time O*(*m* log^{O(1)} *n* log(1/ε)).

In spite of its strong asymptotic behavior, the large exponent on the log factor makes this algorithm slow in practice. The Spielman-Srivastava sparsification algorithm offered no improvement—effective resistances do give the ideal probabilities with which to sample edges for a sparsifier, but computing them requires solving yet more Laplacian linear systems.

Koutis et al.^{24} removed this dependency problem by using low-stretch trees to compute less aggressive sampling probabilities which are strictly greater than those suggested by effective resistances. This can be done more quickly, and along with some other elegant ideas and fast data structures, is sufficient to yield a Laplacian linear system solver which runs in time

**4.2. Fast sparsification algorithms**

The algorithms from Section 3 certified the existence of good sparsifiers, but run quite slowly in practice. Those techniques have been significantly refined, and now there are three major ways to produce sparsifiers quickly.

First, the bottleneck in sampling via effective resistances is approximating the effective resistances themselves. The Laplacian system solver of Koutis, Miller, and Peng described above can be used to calculate those resistances, which can then be used to sample the graph. The best analysis is given by Koutis et al.^{23} who give an *O*(*m* log *n* log log *n* log (1/ε)) time algorithm for generating ((1 + ε), *O*(log^{3} *n*/ε^{2}))-spectral sparsifiers.

Second, the decomposition-and-sampling algorithm of Spielman and Teng^{34} can be sped up by improving the local clusterings used to create a decomposition. In the local clustering problem, one is given a vertex and cluster size as input, and one tries to find a cluster of low conductance near that vertex of size at most the target size, in time proportional to the target cluster size. Faster algorithms for local clustering have been developed by Andersen et al.^{5} and by Andersen and Peres.^{6}

Third, unions of random spanning trees of a graph *G* can make good cut-sparsifiers: the union of two random spanning trees is (log *n*)-cut similar to *G*,^{20} while the union of *O*(log^{2} *n*/ε^{2}) random spanning trees, reweighed proportionally to effective resistance, is (1 + ε)-cut similar to *G*.^{18} Although it remains to be seen if the union of a small number of random spanning trees can produce a spectral sparsifier, Kapralov and Panigrahy showed that one can build a (1 + ε)-spectral sparsifier of a graph from the union of spanners of *O*(log^{4} *n*/ε^{4}) random subgraphs of *G*.^{21}

Fast algorithms for spectral sparsification and Laplacian systems provide a set of powerful tools for network analysis. In particular, they lead to nearly-linear time algorithms for the following basic graph problems:

APPROXIMATE FIEDLER VECTORS^{33}: *Input*: *G* = (*V*, *E*, *w*) and ε > 0. *Output*: an ε-approximation of the second smallest eigenvalue, λ_{2}(*L*_{G}), (also known as the Fiedler value) of *L*_{G}, along with a vector *v* orthogonal to the all 1s vector such that *v*^{T}*L*_{G}*v* ≤ (1 + ∈)λ_{2}(*L*_{G}).

ELECTRICAL FLOWS^{13,32,33}: *Input*: *G* = (*V*, *E*, *w*) where weights are resistances, *s*, *t* ∈ *V*, and ε > 0. *Output*: an ε-approximation of the electrical flows over all edges when 1 unit of flow is injected into *s* and extracted from *t*.

EFFECTIVE RESISTANCE APPROXIMATION^{32}: *Input*: *G* = (*V*, *E*, *w*) where weights are resistances and ε > 0. *Output*: a data structure for computing an ε-approximation of the effective resistance of any pair of vertices in *G*. Data Structure: *O*(*n* log *n*/ε^{2}) space, and *O*(log/ε^{2}*n*) query time, and *O*(*m* log^{2} *n* log log^{2} *n*/ε^{2}) preprocessing time.

LEARNING FROM LABELED DATA ON A GRAPH^{35}: *Input* a strongly connected (aperiodic) directed graph *G* = (*V*, *E*) and a labeling function *y*, where *y* assigns a label from a label set *Y* = {1, 1} to each vertex of a subset *S* ⊂ *V* and 0 to vertices in *V* *S*, and a parameter *μ. Output* the function *f*: *V* →
that minimizes

where

and *π* is the stationary distribution of the random walk on the graph with transition probability function *p*.

COVER TIME OF RANDOM WALK^{16}: *Input*: *G* = (*V*, *E*), *Output*: a constant approximation of the cover time for random walks.

The algorithm for ELECTRICAL FLOWS led to a breakthrough for the following fundamental combinatorial optimization problem, for which the best previously known algorithm ran in time Õ(*mn*^{1/2}/ε), Õ(*mn*^{2/3}/ε^{−1}) and Õ(*m*^{3/2}/ε^{−1}).

MAXIMUM FLOWS AND MINIMUM CUTS^{13}: *Input: G* = (*V*, *E*, *w*) where *w* are capacities, *s*, *t* ∈ *V*, and ε > 0, *Output*: an ε-approximation of *s-t* maximum flow and minimum cut. *Algorithm*: *Õ*(*mn*^{1/3} · poly (1/ε)) time.

Spectral graph sparsification also played a role in understanding other network phenomena. For example, Chierichetti et al.^{12} discovered a connection between rumor spreading in a network and the spectral sparsification procedure of Spielman and Teng,^{34} and applied this connection to bound the speed of rumor spreading that arises in social networks.

### 5. Open Question

The most important open question about spectral sparsification is whether one can design a nearly linear time algorithm that computes (*σ*, *d*)-spectral sparsifiers for any constants *σ* and *d*. The algorithms based on Theorem 7 are polynomial time, but slow. All of the nearly-linear time algorithms of which we are aware produce sparsifiers with *d* logarithmic in *n*.

### 6. Conclusion

Spectral sparsification has proved to be a remarkably useful tool in algorithm design, linear algebra, combinatorial optimization, machine learning, and network analysis. Theorem 8 has already been applied many times within pure mathematics (see, e.g., Naor^{28}). We hope this body of work will encourage more exchange of ideas between numerical analysis, pure mathematics and theoretical computer science, and inspire and enable the development of faster algorithms and novel analyses of network phenomena.

## Join the Discussion (0)

## Become a Member or Sign In to Post a Comment