Research and Advances
Systems and Networking Review articles

Smoothed Analysis: An Attempt to Explain the Behavior of Algorithms in Practice

This Gödel Prize-winning work traces the steps toward modeling real data.
Posted
  1. Introduction
  2. The Behavior of Algorithms
  3. Examples of Smoothed Analysis
  4. Discussion
  5. Acknowledgments
  6. References
  7. Authors
  8. Footnotes
progressively diffuse image

“My experiences also strongly confirmed my previous opinion that the best theory is inspired by practice and the best practice is inspired by theory.”
            Donald E. Knuth, “Theory and Practice,” Theoretical Computer Science, 1991.

Algorithms are high-level descriptions of how computational tasks are performed. Engineers and experimentalists design and implement algorithms, and generally consider them a success if they work in practice. However, an algorithm that works well in one practical domain might perform poorly in another. Theorists also design and analyze algorithms, with the goal of providing provable guarantees about their performance. The traditional goal of theoretical computer science is to prove that an algorithm performs well in the worst case: if one can prove that an algorithm performs well in the worst case, then one can be confident that it will work well in every domain. However, there are many algorithms that work well in practice that do not work well in the worst case. Smoothed analysis provides a theoretical framework for explaining why some of these algorithms do work well in practice.

The performance of an algorithm is usually measured by its running time, expressed as a function of the input size of the problem it solves. The performance profiles of algorithms across the landscape of input instances can differ greatly and can be quite irregular. Some algorithms run in time linear in the input size on all instances, some take quadratic or higher order polynomial time, while some may take an exponential amount of time on some instances.

Traditionally, the complexity of an algorithm is measured by its worst-case performance. If a single input instance triggers an exponential runtime, the algorithm is called an exponential-time algorithm. A polynomial-time algorithm is one that takes polynomial time on all instances. While polynomial-time algorithms are usually viewed as being efficient, we clearly prefer those whose runtime is a polynomial of low degree, especially those that run in nearly linear time.

It would be wonderful if every algorithm that ran quickly in practice was a polynomial-time algorithm. As this is not always the case, the worst-case framework is often the source of discrepancy between the theoretical evaluation of an algorithm and its practical performance.

It is commonly believed that practical inputs are usually more favorable than worst-case instances. For example, it is known that the special case of the Knapsack problem in which one must determine whether a set of n numbers can be divided into two groups of equal sum does not have a polynomial-time algorithm, unless NP is equal to P. Shortly before he passed away, Tim Russert of NBC’s “Meet the Press,” commented that the 2008 election could end in a tie between the Democratic and the Republican candidates. In other words, he solved a 51-item Knapsack problema by hand within a reasonable amount of time, and most likely without using the pseudo-polynomial-time dynamic-programming algorithm for Knapsack!

In our field, the simplex algorithm is the classic example of an algorithm that is known to perform well in practice but has poor worst-case complexity. The simplex algorithm solves a linear program, for example, of the form,

eq01.gif

where A is an m × n matrix, b is an m-place vector, and c is an n-place vector. In the worst case, the simplex algorithm takes exponential time.25 Developing rigorous mathematical theories that explain the observed performance of practical algorithms and heuristics has become an increasingly important task in Theoretical Computer Science. However, modeling observed data and practical problem instances is a challenging task as insightfully pointed out in the 1999 “Challenges for Theory of Computing” Report for an NSF-Sponsored Workshop on Research in Theoretical Computer Science.b

While theoretical work on models of computation and methods for analyzing algorithms has had enormous payoff, we are not done. In many situations, simple algorithms do well. Take for example the Simplex algorithm for linear programming, or the success of simulated annealing of certain supposedly intractable problems. We don’t understand why! It is apparent that worst-case analysis does not provide useful insights on the performance of algorithms and heuristics and our models of computation need to be further developed and refined. Theoreticians are investing increasingly in careful experimental work leading to identification of important new questions in algorithms area. Developing means for predicting the performance of algorithms and heuristics on real data and on real computers is a grand challenge in algorithms.

Needless to say, there are a multitude of algorithms beyond simplex and simulated annealing whose performance in practice is not well explained by worst-case analysis. We hope that theoretical explanations will be found for the success in practice of many of these algorithms, and that these theories will catalyze better algorithm design.

Back to Top

The Behavior of Algorithms

When A is an algorithm for solving problem P we let TA[x] denote the running time of algorithm A on an input instance x. If the input domain Ω has only one input instance x, then we can use the instance-based measure cacm5210_q.gif and cacm5210_r.gif to decide which of the two algorithms A1 and A2 more efficiently solves P. If Ω has two instances x and y, then the instance-based measure of an algorithm A defines a two-dimensional vector (TA[x], TA[y]). It could be the case that cacm5210_q.gif < cacm5210_r.gif but cacm5210_s.gif > cacm5210_t.gif . Then, strictly speaking, these two algorithms are not comparable. Usually, the input domain is much more complex, both in theory and in practice. The instance-based complexity measure TA[·] defines an |Ω| dimensional vector when Ω is finite. In general, it can be viewed as a function from Ω to cacm5210_a.gif but it is unwieldy. To compare two algorithms, we require a more concise complexity measure.

An input domain Ω is usually viewed as the union of a family of subdomains {Ω1, …, Ωn, …}, where Ωn represents all instances in Ω of size n. For example, in sorting, Ωn is the set of all tuples of n elements; in graph algorithms, Ωn is the set of all graphs with n vertices; and in computational geometry, we often have cacm5210_b.gif . In order to succinctly express the performance of an algorithm A, for each Ωn one defines a scalar TA(n) that summarizes the instance-based complexity measure, TA[·], of A over Ωn. One often further simplifies this expression by using big-O or big-Θ notation to express TA(n) asymptotically.

Traditional Analyses. It is understandable that different approaches to summarizing the performance of an algorithm over Ωn can lead to very different evaluations of that algorithm. In Theoretical Computer Science, the most commonly used measures are the worst-case measure and the average-case measures.

The worst-case measure is defined as

ueq01.gif

The average-case measures have more parameters. In each average-case measure, one first determines a distribution of inputs and then measures the expected performance of an algorithm assuming inputs are drawn from this distribution. Supposing S provides a distribution over each Ωn, the average-case measure according to S is

ueq02.gif

where we use x isin.gif sΩn to indicate that x is randomly chosen from Ωn according to distribution S.

Critique of Traditional Analyses. Low worst-case complexity is the gold standard for an algorithm. When low, the worst-case complexity provides an absolute guarantee on the performance of an algorithm no matter which input it is given. Algorithms with good worst-case performance have been developed for a great number of problems.

However, there are many problems that need to be solved in practice for which we do not know algorithms with good worst-case performance. Instead, scientists and engineers typically use heuristic algorithms to solve these problems. Many of these algorithms work well in practice, in spite of having a poor, sometimes exponential, worst-case running time. Practitioners justify the use of these heuristics by observing that worst-case instances are usually not “typical” and rarely occur in practice. The worst-case analysis can be too pessimistic. This theory-practice gap is not limited to heuristics with exponential complexity. Many polynomial time algorithms, such as interior-point methods for linear programming and the conjugate gradient algorithm for solving linear equations, are often much faster than their worst-case bounds would suggest. In addition, heuristics are often used to speed up the practical performance of implementations that are based on algorithms with polynomial worst-case complexity. These heuristics might in fact worsen the worst-case performance, or make the worst-case complexity difficult to analyze.

Average-case analysis was introduced to overcome this difficulty. In average-case analysis, one measures the expected running time of an algorithm on some distribution of inputs. While one would ideally choose the distribution of inputs that occurs in practice, this is difficult as it is rare that one can determine or cleanly express these distributions, and the distributions can vary greatly between one application and another. Instead, average-case analyses have employed distributions with concise mathematical descriptions, such as Gaussian random vectors, uniform {0, 1} vectors, and Erdös–Rényi random graphs.

The drawback of using such distributions is that the inputs actually encountered in practice may bear very little resemblance to the inputs that are likely to be generated by such distributions. For example, one can see what a random image looks like by disconnecting most TV sets from their antennas, at which point they display “static.” These random images do not resemble actual television shows. More abstractly, Erdös–Rényi random graph models are often used in average-case analyses of graph algorithms. The Erdös–Rényi distribution G(n, p), produces a random graph by including every possible edge in the graph independently with probability p. While the average degree of a graph chosen from G(n, 6/(n − 1)) is approximately six such a graph will be very different from the graph of a triangulation of a point set in two dimensions, which will also have average degree approximately six.

In fact, random objects such as random graphs and random matrices have special properties with exponentially high probability, and these special properties might dominate the average-case analysis. Edelman14 writes of random matrices:

What is a mistake is to psychologically link a random matrix with the intuitive notion of a “typical” matrix or the vague concept of “any old matrix.”
In contrast, we argue that “random matrices” are very special matrices.

Smoothed Analysis: A Step Toward Modeling Real Data. Because of the intrinsic difficulty in defining practical distributions, we consider an alternative approach to modeling real data. The basic idea is to identify typical properties of practical data, define an input model that captures these properties, and then rigorously analyze the performance of algorithms assuming their inputs have these properties.

Smoothed analysis is a step in this direction. It is motivated by the observation that practical data is often subject to some small degree of random noise. For example,

  • In industrial optimization and economic prediction, the input parameters could be obtained by physical measurements, and measurements usually have some of low magnitude uncertainty.
  • In the social sciences, data often comes from surveys in which subjects provide integer scores in a small range (say between 1 and 5) and select their scores with some arbitrariness.
  • Even in applications where inputs are discrete, there might be randomness in the formation of inputs. For instance, the network structure of the Internet may very well be governed by some “blueprints” of the government and industrial giants, but it is still “perturbed” by the involvements of smaller Internet service providers.

In these examples, the inputs usually are neither completely random nor completely arbitrary. At a high level, each input is generated from a two-stage model: In the first stage, an instance is generated and in the second stage, the instance from the first stage is slightly perturbed. The perturbed instance is the input to the algorithm.

In smoothed analysis, we assume that an input to an algorithm is subject to a slight random perturbation. The smoothed measure of an algorithm on an input instance is its expected performance over the perturbations of that instance. We define the smoothed complexity of an algorithm to be the maximum smoothed measure over input instances.

For concreteness, consider the case cacm5210_b.gif , which is a common input domain in computational geometry, scientific computing, and optimization. For these continuous inputs and applications, the family of Gaussian distributions provides a natural model of noise or perturbation.

Recall that a univariate Gaussian distribution with mean 0 and standard deviation σ has density

ueq03.gif

The standard deviation measures the magnitude of the perturbation. A Gaussian random vector of variance σ2 centered at the origin in cacm5210_b.gif is a vector in which each entry is an independent Gaussian random variable of standard deviation σ and mean 0. For a vector cacm5210_c.gif , a σ-Gaussian perturbation of cacm5210_w.gif is a random vector x = cacm5210_w.gif + g, where g is a Gaussian random vector of variance σ2. The standard deviation of the perturbation we apply should be related to the norm of the vector it perturbs. For the purposes of this article, we relate the two by restricting the unperturbed inputs to lie in [−1, 1]n. Other reasonable approaches are taken elsewhere.

DEFINITION 1. (SMOOTHED COMPLEXITY). Suppose A is an algorithm with cacm5210_b.gif . Then, the smoothed complexity of A with σ-Gaussian perturbations is given by

ueq04.gif

where g is a Gaussian random vector of variance σ2.

In this definition, the “original” input cacm5210_w.gif is perturbed to obtain the input cacm5210_w.gif + g, which is then fed to the algorithm. For each original input, this measures the expected running time of algorithm A on random perturbations of that input. The maximum out front tells us to measure the smoothed analysis by the expectation under the worst possible original input.

The smoothed complexity of an algorithm measures the performance of the algorithm both in terms of the input size n and in terms of the magnitude σ of the perturbation. By varying σ between zero and infinity, one can use smoothed analysis to interpolate between worst-case and average-case analysis. When σ = 0, one recovers the ordinary worst-case analysis. As σ grows large, the random perturbation g dominates the original cacm5210_w.gif , and one obtains an average-case analysis. We are most interested in the situation in which σ is small relative to || cacm5210_w.gif ||, in which case cacm5210_w.gif + g may be interpreted as a slight perturbation of cacm5210_w.gif . The dependence on the magnitude σ is essential and much of the work in smoothed analysis demonstrates that noise often makes a problem easier to solve.

DEFINITION 2. A has polynomial smoothed complexity if there exist positive constants n0, σ0, c, k1, and k2 such that for all nn0 and 0 ≤ σ ≤ σ0,

eq02.gif

From Markov’s inequality, we know that if an algorithm A has smoothed complexity T(n, σ), then

eq03.gif

Thus, if A has polynomial smoothed complexity, then for any cacm5210_w.gif , with probability at least (1 – δ), A can solve a random perturbation of cacm5210_w.gif in time polynomial in n, 1/σ, and 1/δ. However, the probabilistic upper bound given in (3) does not necessarily imply that the smoothed complexity of A is O(T(n,σ)). Blum and Dunagan7 and subsequently Beier and Vöcking6 introduced a relaxation of polynomial smoothed complexity.

DEFINITION 3. A has probably polynomial smoothed complexity if there exist constants n0, σ0, c, and α, such that for all nn0 and 0 ≤ σ ≤ σ0,

eq04.gif

They show that some algorithms have probably polynomial smoothed complexity, in spite of the fact that their smoothed complexity according to Definition 1 is unbounded.

Back to Top

Examples of Smoothed Analysis

In this section, we give a few examples of smoothed analysis. We organize them in five categories: mathematical programming, machine learning, numerical analysis, discrete mathematics, and combinatorial optimization. For each example, we will give the definition of the problem, state the worst-case complexity, explain the perturbation model, and state the smoothed complexity under the perturbation model.

Mathematical Programming. The typical problem in mathematical programming is the optimization of an objective function subject to a set of constraints. Because of its importance to economics, management science, industry, and military planning, many optimization algorithms and heuristics have been developed, implemented and applied to practical problems. Thus, this field provides a great collection of algorithms for smoothed analysis.

Linear Programming. Linear programming is the most fundamental optimization problem. A typical linear program is given in Equation 1. The most commonly used linear programming algorithms are the simplex algorithm12 and the interior-point algorithms.

The simplex algorithm, first developed by Dantzig in 1951,12 is a family of iterative algorithms. Most of them are two-phase algorithms: Phase I determines whether a given linear program is infeasible, unbounded in the objective direction, or feasible with a bounded solution, in which case, a vertex v0 of the feasible region is also computed. Phase II is iterative: in the ith iteration, the algorithm finds a neighboring vertex vi of vi–1 with better objective value or terminates by returning vi–1 when no such neighboring vertex exists. The simplex algorithms differ in their pivot rules, which determine which vertex vi to choose when there are multiple choices. Several pivot rules have been proposed; however, almost all existing pivot rules are known to have exponential worst-case complexity.25

Spielman and Teng36 considered the smoothed complexity of the simplex algorithm with the shadow-vertex pivot rule, developed by Gass and Saaty.18 They used Gaussian perturbations to model noise in the input data and proved that the smoothed complexity of this algorithm is polynomial. Vershynin38 improved their result to obtain a smoothed complexity of

ueq05.gif

See Blum and Dunagan, and Dunagan et al.7,13 for smoothed analyses of other linear programming algorithms such as the interior-point algorithms and the perceptron algorithm.

Quasi-Concave Minimization. Another fundamental optimization problem is quasi-concave minimization. Recall that a function f: cacm5210_d.gif is quasi-concave if all of its upper level sets Lγ = {x| f (x) ≥ γ} are convex. In quasi-concave minimization, one is asked to find the minimum of a quasi-concave function subject to a set of linear constraints. Even when restricted to concave quadratic functions over the hypercube, concave minimization is NP-hard.

In applications such as stochastic and multiobjective optimization, one often deals with data from low-dimensional subspaces. In other words, one needs to solve a quasi-concave minimization problem with a low-rank quasi-concave function.23 Recall that a function f : cacm5210_d.gif has rank k if it can be written in the form

ueq06.gif

for a function g: cacm5210_e.gif and linearly independent vectors a1, a2, …, ak.

Kelner and Nikolova23 proved that, under some mild assumptions on the feasible convex region, if k is a constant then the smoothed complexity of quasi-concave minimization is polynomial when f is perturbed by noise. Key to their analysis is a smoothed bound on the size of the k-dimensional shadow of the high-dimensional polytope that defines the feasible convex region. Their result is a nontrivial extension of the analysis of two-dimensional shadows of Kelner and Spielman, and Spielman and Teng.24,36

Machine Learning. Machine learning provides many natural problems for smoothed analysis. The field has many heuristics that work in practice, but not in the worst case, and the data defining most machine learning problems is inherently noisy.

K-means. One of the fundamental problems in machine learning is that of k-means clustering: the partitioning of a set of d-dimensional vectors Q = {q1, …, qn} into k clusters {Q1, …, Qk} so that the intracluster variance

ueq07.gif

is minimized, where cacm5210_f.gif is the centroid of Qi.

One of the most widely used clustering algorithms is Lloyd’s algorithm (IEEE Transaction on Information Theory, 1982). It first chooses an arbitrary set of k centers and then uses the Voronoi diagram of these centers to partition Q into k clusters. It then repeats the following process until it stabilizes: use the centroids of the current clusters as the new centers, and then repartition Q accordingly.

Two important questions about Lloyd’s algorithm are how many iterations its takes to converge, and how close to optimal is the solution it finds? Smoothed analysis has been used to address the first question. Arthur and Vassilvitskii proved that in the worst case, Lloyd’s algorithm requires 2Ω( radic.gif n) iterations to converge,3 but that it has smoothed complexity polynomial in nk and σ−1.4Manthey and Röglin28 recently reduced this bound to polynomial in n radic.gif k and σ−1.

Perceptrons, Margins, and Support Vector Machines. Blum and Dunagan’s analysis of the perceptron algorithm7 for linear programming implicitly contains results of interest in machine learning. The ordinary perceptron algorithm solves a fundamental problem: given a collection of points x1, …, xn isin.gif cacm5210_g.gif and labels b1, …, bn isin.gif {±1}n, find a hyperplane separating the positively labeled examples from the negatively labeled ones, or determine that no such plane exists. Under a smoothed model in which the points x1, …, xn are subject to a σ-Gaussian perturbation, Blum and Dunagan show that the perceptron algorithm has probably polynomial smoothed complexity, with exponent α = 1. Their proof follows from a demonstration that if the positive points can be separated from the negative points, then they can probably be separated by a large margin. That is, there probably exists a plane separating the points for which no point is too close to that separating plane.

It is known that the perceptron algorithm converges quickly in this case. Moreover, this margin is exactly what is maximized by support vector machines.

PAC Learning. Probably approximately correct learning (PAC learning) is a framework in machine learning introduced by Valiant in which a learner is provided with a polynomial number of labeled examples from a given distribution, and must produce a classifier that is usually correct with reasonably high probability. In standard PAC learning, the distribution from which the examples are drawn is fixed. Recently, Kalai and Teng21 applied smoothed analysis to this problem by perturbing the input distribution. They prove that polynomial sized decision trees are PAC-learnable in polynomial time under perturbed product distributions. In constrast, under the uniform distribution even super-constant size decision trees are not known to be PAC-learnable in polynomial time.

Numerical Analysis. One of the foci of numerical analysis is the determination of how much precision is required by numerical methods. For example, consider the most fundamental problem in computational science—that of solving systems of linear equations. Because of the round-off errors in computation, it is crucial to know how many bits of precision a linear solver should maintain so that its solution is meaningful.

For example, Wilkinson (JACM 1961) demonstrated a family of linear systemsc of n variables and {0, −1, 1} coefficients for which Gaussian elimination with partial pivoting—the most popular variant in practice—requires n-bits of precision.

Precision Requirements of Gaussian Elimination. In practice one almost always obtains accurate answers using much less precision. High-precision solvers are rarely used or needed. For example, Matlab uses 64 bits.

Building on the smoothed analysis of condition numbers (discussed below), Sankar et al.33,34 proved that it is sufficient to use O(log2(n/σ)) bits of precision to run Gaussian elimination with partial pivoting when the matrices of the linear systems are subject to σ-Gaussian perturbations.

The Condition Number. The smoothed analysis of the condition number of a matrix is a key step toward understanding numerical precision required in practice. For a square matrix A, its condition number κ(A) is given by κ(A) = ||A||2||A−1||2 where ||A||2 = maxx ||Ax||2/||x||2. The condition number of A measures how much the solution to a system Ax = b changes as one makes slight changes to A and b: If one solves the linear system using fewer than log(κ(A)) bits of precision, then one is likely to obtain a result far from a solution.

The quantity 1/||A−1||2 = minx||Ax||2/||x||2 is known as the smallest singular value of A. Sankar et al.34 proved the following statement: For any square matrix amacr_u.gif in cacm5210_h.gif satisfying || amacr_u.gif ||2 radic.gif n, and for any x > 1,

ueq08.gif

where A is a σ-Gaussian perturbation of A. Wschebor40 improved this bound to show that for σ ≤ 1 and || amacr_u.gif ||2 ≤ 1,

ueq09.gif

See Bürgisser et al. and Dunagan et al.10,13 for smoothed analysis of the condition numbers of other problems.

Discrete Mathematics. For problems in discrete mathematics, it is more natural to use Boolean perturbations: Let cacm5210_w.gif = ( cacm5210_v.gif 1,…, cacm5210_v.gif n) isin.gif {0, 1}n or {−1, 1}n. The σ-Boolean perturbation of cacm5210_w.gif is a random string x = (x1,…,xn) isin.gif {0, 1} n or {−1, 1}n, where xi = cacm5210_v.gif i with probability 1 – σ and xi cacm5210_v.gif i with probability σ. That is, each bit is flipped independently with probability σ.

Believing that σ-perturbations of Boolean matrices should behave like Gaussian perturbations of real matrices, Spielman and Teng35 made the following conjecture:

Let A be an n by n matrix of independently and uniformly chosen ±1 entries. Then

ueq10.gif

for some constant α.

Statements close to this conjecture and its generalizations were recently proved by Vu and Tao39 and Rudelson and Vershynin.32

The σ-Boolean perturbation of a graph can be viewed as a smoothed extension of the classic Erdös–Rényi random graph model. The σ-perturbation of a graph gmacr_u.gif , which we denote by G gmacr_u.gif (n, σ), is a distribution of random graphs in which every edge is removed with probability σ and every nonedge is included with probability σ. Clearly for p isin.gif [0, 1], G(n, p) = Gθ(n, p), i.e., the p-Boolean perturbation of the empty graph. One can define a smoothed extension of other random graph models. For example, for any m and G = (V, E), Bohman et al.9 define G( gmacr_u.gif , m) to be the distribution of the random graphs (V, E cup.gif T) where T is a set of m edges chosen uniformly at random from the complement of E, i.e., chosen from emacr_u.gif = {(i, j) ∉ E}.

A popular subject of study in the traditional Erdös–Rényi model is the phenomenon of phase transition: for many properties such as being connected or being Hamiltonian, there is a critical p below which a graph is unlikely to have each property and above which it probably does have the property. Related phase transitions have also been found in the smoothed Erdös–Rényi models G gmacr_u.gif (n, σ).17, 26

Smoothed analysis based on Boolean perturbations can be applied to other discrete problems. For example, Feige16 used the following smoothed model for 3CNF formulas. First, an adversary picks an arbitrary formula with n variables and m clauses. Then, the formula is perturbed at random by flipping the polarity of each occurrence of each variable independently with probability σ. Feige gave a randomized polynomial-time refutation algorithm for this problem.

Combinatorial Optimization. Beier and Vöcking6 and Röglin and Vöcking31 considered the smoothed complexity of integer linear programming. They studied programs of the form

eq05.gif

where A is an m × n real matrix, b isin.gif cacm5210_i.gif , and cacm5210_j.gif .

Recall that ZPP denotes the class of decision problems solvable by a randomized algorithm that always returns the correct answer, and whose expected running time (on every input) is polynomial. Beier, Röglin, and Vöcking6,31 proved the following statement: For any constant c, let Π be a class of integer linear programs of form 5 with |D| = O(nc). Then, Π has an algorithm of probably polynomial smoothed complexity if and only if Πu isin.gif ZPP, where Πu is the “unary” representation of Π. Consequently, the 0/1-knapsack problem, the constrained shortest path problem, the constrained minimum spanning tree problem, and the constrained minimum weighted matching problem have probably polynomial smoothed complexity.

Smoothed analysis has been applied to several other optimization problems such as local search and TSP,15 scheduling,5 motion planning,11 superstring approximation,27 multiobjective optimization,31 string alignment,2 and multidimensional packing.22

Back to Top

Discussion

A Theme in Smoothed Analyses. One idea is present in many of these smoothed analyses: perturbed inputs are rarely ill-conditioned. Informally speaking, the condition number of a problem measures how much its answer can be made to change by slight modifications of its input. Many numerical algorithms are known to run faster when their inputs are well conditioned. Some smoothed analyses, such as that of Blum and Dunagan,7 exploit this connection explicitly. Others rely on similar ideas.

For many problems, the condition number of an input is approximately the inverse of the distance of that input to the set of degenerate or ill-posed problems. As these degenerate sets typically have measure zero, it is intuitively reasonable that a perturbed instance should be unlikely to be too close to a degenerate one.

Other Performance Measures. Although we normally evaluate the performance of an algorithm by its running time, other performance parameters are often important. These performance parameters include the amount of space required, the number of bits of precision required to achieve a given output accuracy, the number of cache misses, the error probability of a decision algorithm, the number of random bits needed in a randomized algorithm, the number of calls to a particular subroutine, and the number of examples needed in a learning algorithm. The quality of an approximation algorithm could be its approximation ratio; the quality of an online algorithm could be its competitive ratio; and the parameter of a game could be its price of anarchy or the rate of convergence of its best-response dynamics. We anticipate future results on the smoothed analysis of these performance measures.

Precursors to Smoothed Complexity. Several previous probabilistic models have also combined features of worst-case and average-case analyses.

Haimovich19 and Adler1 considered the following probabilistic analysis: Given a linear program cacm5210_p.gif = (A, b, c) of form (1), they defined the expected complexity of cacm5210_p.gif to be the expected complexity of the simplex algorithm when the inequality sign of each constraint is uniformly flipped. They proved that the expected complexity of the worst possible cacm5210_p.gif is polynomial.

Blum and Spencer8 studied the design of polynomial-time algorithms for the semi-random model, which combines the features of the semi-random source with the random graph model that has a “planted solution.” This model can be illustrated with the kColoring Problem: An adversary plants a solution by partitioning the set V of n vertices into k subsets V1,…,Vk. Let

ueq11.gif

be the set of potential inter-subset edges. A graph is then constructed by the following semi-random process that perturbs the decisions of the adversary: In a sequential order, the adversary decides whether to include each edge of F in the graph, and then a random process reverses the decision with probability σ. Note that every graph generated by this semi-random process has the planted coloring: c(v) = i for all v isin.gif Vi, as both the adversary and the random process preserve this solution by only considering edges from F.

As with the smoothed model, one can work with the semi-random model by varying σ between 0 and 1 to interpolate between worst-case and average-case complexity for k-coloring.

Algorithm Design and Analysis for Special Families of Inputs. Probabilistic approaches are not the only means of characterizing practical inputs. Much work has been spent on designing and analyzing inputs that satisfy certain deterministic but practical input conditions. We mention a few examples that excite us.

In parallel scientific computing, one may often assume that the input graph is a well-shaped finite element mesh. In VLSI layout, one often only considers graphs that are planar or nearly planar. In geometric modeling, one may assume that there is an upper bound on the ratio among the distances between points. In Web analysis, one may assume that the input graph satisfies some powerlaw degree distribution or some small-world properties. When analyzing hash functions, one may assume that the data being hashed has some non-negligible entropy.30

Limits of Smoothed Analysis. The goal of smoothed analysis is to explain why some algorithms have much better performance in practice than predicted by the traditional worst-case analysis. However, for many problems, there may be better explanations.

For example, the worst-case complexity and the smoothed complexity of the problem of computing a market equilibrium are essentially the same.20 So far, no polynomial-time pricing algorithm is known for general markets. On the other hand, pricing seems to be a practically solvable problem, as Kamal Jain put it “If a Turing machine can’t compute then an economic system can’t compute either.”

A key step to understanding the behaviors of algorithms in practice is the construction of analyzable models that are able to capture some essential aspects of practical input instances. For practical inputs, there may often be multiple parameters that govern the process of their formation.

One way to strengthen the smoothed analysis framework is to improve the model of the formation of input instances. For example, if the input instances to an algorithm A come from the output of another algorithm B, then algorithm B, together with a model of B‘s input instances, provide a description of A‘s inputs. For example, in finite-element calculations, the inputs to the linear solver A are stiffness matrices that are produced by a meshing algorithm B. The meshing algorithm B, which could be a randomized algorithm, generates a stiffness matrix from a geometric domain Ω and a partial differential equation F. So, the distribution of the stiffness matrices input to algorithm A is determined by the distribution cacm5210_u.gif of the geometric domains Ω and the set F of partial differential equations, and the randomness in algorithm B. If, for example, cacm5210_k.gif is the design of an advanced rocket from a set cacm5210_l.gif of “blueprints” and F is from a set cacm5210_m.gif of PDEs describing physical parameters such as pressure, speed, and temperature, and Ω is generated by a perturbation model cacm5210_n.gif of the blueprints, then one may further measure the performance of A by the smoothed value:

ueq12.gif

where cacm5210_o.gif indicates that Ω is obtained from a perturbation of cacm5210_k.gif and XB(Ω, F) indicates that X is the output of the randomized algorithm B.

A simpler way to strengthen smoothed analysis is to restrict the family of perturbations considered. For example, one could employ zero-preserving perturbations, which only apply to nonzero entries. Or, one could use relative perturbations, which perturb every real number individually by a small multiplicative factor.

Algorithm Design Based on Perturbations and Smoothed Analysis. Finally, we hope insights gained from smoothed analysis will lead to new ideas in algorithm design. On a theoretical front, Kelner and Spielman24 exploited ideas from the smoothed analysis of the simplex method to design a (weakly) polynomial-time simplex method that functions by systematically perturbing its input program. On a more practical level, we suggest that it might be possible to solve some problems more efficiently by perturbing their inputs. For example, some algorithms in computational geometry implement variable-precision arithmetic to correctly handle exceptions that arise from geometric degeneracy.29 However, degeneracies and near-degeneracies occur with exceedingly small probability under perturbations of inputs. To prevent perturbations from changing answers, one could employ quad-precision arithmetic, placing the perturbations into the least-significant half of the digits.

Our smoothed analysis of Gaussian elimination suggests a more stable solver for linear systems: When given a linear system Ax = b, we first use the standard Gaussian elimination with partial pivoting algorithm to solve Ax = b. Suppose x* is the solution computed. If ||b – Ax*|| is small enough, then we simply return x*. Otherwise, we can determine a parameter ε and generate a new linear system (A + ε G)y = b, where G is a Gaussian matrix with mean 0 and variance 1. Instead of solving Ax = b, we solve a perturbed linear system (A + εG)y = b. It follows from standard analysis that if ε is sufficiently smaller than κ(A), then the solution to the perturbed linear system is a good approximation to the original one. One could use practical experience or binary search to set ε.

The new algorithm has the property that its success depends only on the machine precision and the condition number of A, while the original algorithm may fail due to large growth factors. For example, the following is a fragment of Matlab code that first solves a linear system whose matrix is the 70 × 70 matrix Wilkinson designed to trip up partial pivoting, using the Matlab linear solver. We then perturb the system, and apply the Matlab solver again.

ins01.gif

Note that while the Matlab linear solver fails to find a good solution to the linear system, our new perturbation-based algorithm finds a good solution. While there are standard algorithms for solving linear equations that do not have the poor worst-case performance of partial pivoting, they are rarely used as they are less efficient.

For more examples of algorithm design inspired by smoothed analysis and perturbation theory, see Teng.37

Back to Top

Acknowledgments

We would like to thank Alan Edelman for suggesting the name “Smoothed Analysis” and thank Heiko Röglin and Don Knuth for helpful comments on this writing.

Due to Communications space constrainsts, we have had to restrict our bibliography to 40 references. We apologize to those whose work we have been forced not to reference.

Back to Top

Back to Top

Back to Top

    1. Adler, I. The expected number of pivots needed to solve parametric linear programs and the efficiency of the self-dual simplex method. Technical report, University of California at Berkeley (May 1983).

    2. Andoni, A. and Krauthgamer, R. The smoothed complexity of edit distance. In Proceedings of ICALP, volume 5125 of Lecture Notes in Computer Science (Springer, 2008) 357–369.

    3. Arthur, D. and Vassilvitskii, S. How slow is the k-means method? In SOCG '06, the 22nd Annual ACM Symposium on Computational Geometry (2006) 144–153.

    4. Arthur, D. and Vassilvitskii, S. Worst-case and smoothed analysis of the ICP algorithm, with an application to the k-means method. In FOCS '06, the 47th Annual IEEE Symposium on Foundations of Computer Science (2006) 153–164.

    5. Becchetti, L., Leonardi, S., Marchetti-Spaccamela, A., Schfer, G., and Vredeveld, T. Average-case and smoothed competitive analysis of the multilevel feedback algorithm. Math. Oper. Res. 31, 1 (2006), 85–108.

    6. Beier, R. and Vöcking, B. Typical properties of winners and losers in discrete optimization. In STOC '04: the 36th Annual ACM Symposium on Theory of Computing (2004), 343–352.

    7. Blum, A. and Dunagan, J. Smoothed analysis of the perceptron algorithm for linear programming. In SODA '02 (2002), 905–914.

    8. Blum, A. and Spencer, J. Coloring random and semi-random k-colorable graphs. J. Algorithms 19, 2 (1995), 204–234.

    9. Bohman, T., Frieze, A. and Martin, R. How many random edges make a dense graph hamiltonian? Random Struct. Algorithms 22, 1 (2003), 33–42.

    10. Bürgisser, P., Cucker, F., and Lotz, M. Smoothed analysis of complex conic condition numbers. J. de Mathématiques Pures Appliqués 86 4 (2006), 293–309.

    11. Damerow, V., Meyer auf der Heide, F., Räcke, H., Scheideler, C., and Sohler, C. Smoothed motion complexity. In Proceedings of the 11th Annual European Symposium on Algorithms (2003), 161–171.

    12. Dantzig, G.B. Maximization of linear function of variables subject to linear inequalities. Activity Analysis of Production and Allocation. T.C. Koopmans, Ed. 1951, 339–347.

    13. Dunagan, J., Spielman, D.A., and Teng, S.-H. Smoothed analysis of condition numbers and complexity implications for linear programming. Mathematical Programming, Series A, 2009. To appear. Preliminary version available at http://arxiv.org/abs/cs/0302011v2.

    14. Edelman, A. Eigenvalue roulette and random test matrices. Linear Algebra for Large Scale and Real-Time Applications. Marc S. Moonen, Gene H. Golub, and Bart L. R. De Moor, eds. NATO ASI Series, 1992, 365–368.

    15. Englert, M., Röglin, H., and Vöcking, B. Worst case and probabilistic analysis of the 2-opt algorithm for the TSP: extended abstract. In SODA '07: The 18th Annual ACM-SIAM Symposium on Discrete Algorithms (2007), 1295–1304.

    16. Feige, U. Refuting smoothed 3CNF formulas. In The 48th Annual IEEE Symposium on Foundations of Computer Science (2007), 407–417.

    17. Flaxman, A. and Frieze, A.M. The diameter of randomly perturbed digraphs and some applications. In APPROX-RANDOM (2004), 345–356.

    18. Gass, S. and Saaty, T. The computational algorithm for the parametric objective function. Naval Res. Logist. Quart. 2, (1955), 39–45.

    19. Haimovich, M. The simplex algorithm is very good!: On the expected number of pivot steps and related properties of random linear programs. Technical report, Columbia University (April 1983).

    20. Huang, L.-S. and Teng, S.-H. On the approximation and smoothed complexity of Leontief market equilibria. In Frontiers of Algorithms Workshop (2007), 96–107.

    21. Kalai, A. and Teng, S.-H. Decision trees are PAC - learnable from most product distributions: a smoothed analysis. ArXiv e-prints (December 2008).

    22. Karger, D. and Onak, K. Polynomial approximation schemes for smoothed and random instances of multidimensional packing problems. In SODA '07: the 18th Annual ACM-SIAM Symposium on Discrete Algorithms (2007), 1207–1216.

    23. Kelner, J.A. and Nikolova, E. On the hardness and smoothed complexity of quasi-concave minimization. In The 48th Annual IEEE Symposium on Foundations of Computer Science (2007), 472–482.

    24. Kelner, J.A. and Spielman, D.A. A randomized polynomial-time simplex algorithm for linear programming. In The 38th Annual ACM Symposium on Theory of Computing (2006), 51–60.

    25. Klee, V. and Minty, G.J. How good is the simplex algorithm? Inequalities – III. O. Shisha, ed. Academic Press, 1972, 159–175.

    26. Krivelevich, M., Sudakov, B. and Tetali, P. On smoothed analysis in dense graphs and formulas. Random Struct. Algorithms 29 (2005), 180–193.

    27. Ma, B. Why greed works for shortest common superstring problem. In CPM '08: Proceedings of the 19th Annual Symposium on Combinatorial Pattern Matching (2008), 244–254.

    28. Manthey, B. and Röglin, H. Improved smoothed analysis of the k-means method. In SODA '09 (2009).

    29. Mehlhorn, K. and Näher, S. The LEDA Platform of Combinatorial and Geometric Computing. Cambridge University Press, New York, 1999.

    30. Mitzenmacher, M. and Vadhan, S. Why simple hash functions work: exploiting the entropy in a data stream. In SODA '08: Proceedings of the Nineteenth Annual ACM-SIAM Symposium on Discrete Algorithms (2008), 746–755.

    31. Röglin, H. and Vöcking, B. Smoothed analysis of integer programming. Proceedings of the 11th International Conference on Integer Programming and Combinatorial Optimization. M. Junger and V. Kaibel, eds. Volume 3509 of Lecture Notes in Computer Science, Springer, 2005, 276–290.

    32. Rudelson, M. and Vershynin, R. The littlewood-offord problem and invertibility of random matrices. Adv. Math. 218, (June 2008), 600–633.

    33. Sankar, A. Smoothed analysis of Gaussian elimination. Ph.D. Thesis, MIT, 2004.

    34. Sankar, A., Spielman, D.A., and Teng, S.-H. Smoothed analysis of the condition numbers and growth factors of matrices. SIAM J. Matrix Anal. Appl. 28, 2 (2006), 446–476.

    35. Spielman, D.A. and Teng, S.-H. Smoothed analysis of algorithms. In Proceedings of the International Congress of Mathematicians (2002), 597–606.

    36. Spielman, D.A. and Teng, S.-H. Smoothed analysis of algorithms: Why the simplex algorithm usually takes polynomial time. J. ACM 51, 3 (20040, 385–463.

    37. Teng, S.-H. Algorithm design and analysis with perburbations. In Fourth International Congress of Chinese Mathematicans (2007).

    38. Vershynin, R. Beyond Hirsch conjecture: Walks on random polytopes and smoothed complexity of the simplex method. In Proceedings of the 47th Annual IEEE Symposium on Foundations of Computer Science (2006), 133–142.

    39. Vu, V.H. and Tao, T. The condition number of a randomly perturbed matrix. In STOC '07: the 39th Annual ACM Symposium on Theory of Computing (2007), 248–255.

    40. Wschebor, M. Smoothed analysis of κ(a). J. Complexity 20, 1 (February 2004), 97–107.

    a. In presidential elections in the United States, each of the 50 states and the District of Columbia is allocated a number of electors. All but the states of Maine and Nebraska use a winner-take-all system, with the candidate winning the majority votes in each state being awarded all of that states electors. The winner of the election is the candidate who is awarded the most electors.

    b. Available at http://sigact.acm.org/

    c. See the second line of the Matlab code at the end of the Discussion section for an example.

    DOI: http://doi.acm.org/10.1145/1562764.1562785

Join the Discussion (0)

Become a Member or Sign In to Post a Comment

The Latest from CACM

Shape the Future of Computing

ACM encourages its members to take a direct hand in shaping the future of the association. There are more ways than ever to get involved.

Get Involved

Communications of the ACM (CACM) is now a fully Open Access publication.

By opening CACM to the world, we hope to increase engagement among the broader computer science community and encourage non-members to discover the rich resources ACM has to offer.

Learn More