Home/Magazine Archive/October 2009 (Vol. 52, No. 10)/Smoothed Analysis: An Attempt to Explain the Behavior.../Full Text

Review articles
# Smoothed Analysis: An Attempt to Explain the Behavior of Algorithms in Practice

"My experiences also strongly confirmed my previous opinion that the best theory is inspired by practice and the best practice is inspired by theory."

Donald E. Knuth, "Theory and Practice,"Theoretical Computer Science, 1991.

Algorithms are high-level descriptions of how computational tasks are performed. Engineers and experimentalists design and implement algorithms, and generally consider them a success if they work in practice. However, an algorithm that works well in one practical domain might perform poorly in another. Theorists also design and analyze algorithms, with the goal of providing provable guarantees about their performance. The traditional goal of theoretical computer science is to prove that an algorithm performs well in the worst case: if one can prove that an algorithm performs well in the worst case, then one can be confident that it will work well in every domain. However, there are many algorithms that work well in practice that do not work well in the worst case. Smoothed analysis provides a theoretical framework for explaining why some of these algorithms do work well in practice.

The performance of an algorithm is usually measured by its running time, expressed as a function of the input size of the problem it solves. The performance profiles of algorithms across the landscape of input instances can differ greatly and can be quite irregular. Some algorithms run in time linear in the input size on all instances, some take quadratic or higher order polynomial time, while some may take an exponential amount of time on some instances.

Traditionally, the complexity of an algorithm is measured by its *worst-case* performance. If a single input instance triggers an exponential runtime, the algorithm is called an exponential-time algorithm. A polynomial-time algorithm is one that takes polynomial time on all instances. While polynomial-time algorithms are usually viewed as being efficient, we clearly prefer those whose runtime is a polynomial of low degree, especially those that run in nearly linear time.

It would be wonderful if every algorithm that ran quickly in practice was a polynomial-time algorithm. As this is not always the case, the worst-case framework is often the source of discrepancy between the theoretical evaluation of an algorithm and its practical performance.

It is commonly believed that practical inputs are usually more favorable than worst-case instances. For example, it is known that the special case of the `Knapsack`

problem in which one must determine whether a set of *n* numbers can be divided into two groups of equal sum does not have a polynomial-time algorithm, unless **NP** is equal to **P**. Shortly before he passed away, Tim Russert of NBC's "Meet the Press," commented that the 2008 election could end in a tie between the Democratic and the Republican candidates. In other words, he solved a 51-item `Knapsack`

problem^{a} by hand within a reasonable amount of time, and most likely without using the pseudo-polynomial-time dynamic-programming algorithm for `Knapsack`

!

In our field, the simplex algorithm is the classic example of an algorithm that is known to perform well in practice but has poor worst-case complexity. The simplex algorithm solves a linear program, for example, of the form,

where **A** is an *m* × *n* matrix, **b** is an *m*-place vector, and **c** is an *n*-place vector. In the worst case, the simplex algorithm takes exponential time.^{25} Developing rigorous mathematical theories that explain the observed performance of practical algorithms and heuristics has become an increasingly important task in *Theoretical Computer Science.* However, modeling observed data and practical problem instances is a challenging task as insightfully pointed out in the 1999 "Challenges for Theory of Computing" Report for an NSF-Sponsored Workshop on Research in Theoretical Computer Science.^{b}

While theoretical work on models of computation and methods for analyzing algorithms has had enormous payoff, we are not done. In many situations, simple algorithms do well. Take for example the Simplex algorithm for linear programming, or the success of simulated annealing of certain supposedly intractable problems. We don't understand why! It is apparent that worst-case analysis does not provide useful insights on the performance of algorithms and heuristics and our models of computation need to be further developed and refined. Theoreticians are investing increasingly in careful experimental work leading to identification of important new questions in algorithms area. Developing means for predicting the performance of algorithms and heuristics on real data and on real computers is a grand challenge in algorithms.

Needless to say, there are a multitude of algorithms beyond simplex and simulated annealing whose performance in practice is not well explained by worst-case analysis. We hope that theoretical explanations will be found for the success in practice of many of these algorithms, and that these theories will catalyze better algorithm design.

When *A* is an algorithm for solving problem *P* we let *T*_{A}[**x**] denote the running time of algorithm *A* on an input instance **x**. If the input domain has only one input instance **x**, then we can use the instance-based measure and to decide which of the two algorithms *A*_{1} and *A*_{2} more efficiently solves *P*. If has two instances **x** and **y**, then the instance-based measure of an algorithm *A* defines a two-dimensional vector (*T*_{A}[**x**], *T*_{A}[**y**]). It could be the case that < but > . Then, strictly speaking, these two algorithms are not comparable. Usually, the input domain is much more complex, both in theory and in practice. The *instance-based complexity measure* *T*_{A}[·] defines an || dimensional vector when is finite. In general, it can be viewed as a function from to but it is unwieldy. To compare two algorithms, we require a more concise complexity measure.

An input domain is usually viewed as the union of a family of subdomains {_{1}, ..., _{n}, ...}, where _{n} represents all instances in of size *n*. For example, in sorting, _{n} is the set of all tuples of *n* elements; in graph algorithms, _{n} is the set of all graphs with *n* vertices; and in computational geometry, we often have . In order to succinctly express the performance of an algorithm *A*, for each _{n} one defines a scalar *T*_{A}(*n*) that summarizes the instance-based complexity measure, *T*_{A}[·], of *A* over _{n}. One often further simplifies this expression by using big-O or big- notation to express *T*_{A}(*n*) asymptotically.

** Traditional Analyses.** It is understandable that different approaches to summarizing the performance of an algorithm over

The *worst-case measure* is defined as

The *average-case measures* have more parameters. In each average-case measure, one first determines a distribution of inputs and then measures the expected performance of an algorithm assuming inputs are drawn from this distribution. Supposing *S* provides a distribution over each _{n}, the average-case measure according to *S* is

where we use **x** _{s}_{n} to indicate that **x** is randomly chosen from _{n} according to distribution *S*.

*Critique of Traditional Analyses.* Low worst-case complexity is the gold standard for an algorithm. When low, the worst-case complexity provides an absolute guarantee on the performance of an algorithm no matter which input it is given. Algorithms with good worst-case performance have been developed for a great number of problems.

However, there are many problems that need to be solved in practice for which we do not know algorithms with good worst-case performance. Instead, scientists and engineers typically use heuristic algorithms to solve these problems. Many of these algorithms work well in practice, in spite of having a poor, sometimes exponential, worst-case running time. Practitioners justify the use of these heuristics by observing that worst-case instances are usually not "typical" and rarely occur in practice. The worst-case analysis can be too pessimistic. This theory-practice gap is not limited to heuristics with exponential complexity. Many polynomial time algorithms, such as interior-point methods for linear programming and the conjugate gradient algorithm for solving linear equations, are often much faster than their worst-case bounds would suggest. In addition, heuristics are often used to speed up the practical performance of implementations that are based on algorithms with polynomial worst-case complexity. These heuristics might in fact worsen the worst-case performance, or make the worst-case complexity difficult to analyze.

Average-case analysis was introduced to overcome this difficulty. In average-case analysis, one measures the expected running time of an algorithm on some distribution of inputs. While one would ideally choose the distribution of inputs that occurs in practice, this is difficult as it is rare that one can determine or cleanly express these distributions, and the distributions can vary greatly between one application and another. Instead, average-case analyses have employed distributions with concise mathematical descriptions, such as Gaussian random vectors, uniform {0, 1} vectors, and ErdösRényi random graphs.

The drawback of using such distributions is that the inputs actually encountered in practice may bear very little resemblance to the inputs that are likely to be generated by such distributions. For example, one can see what a random image looks like by disconnecting most TV sets from their antennas, at which point they display "static." These random images do not resemble actual television shows. More abstractly, ErdösRényi random graph models are often used in average-case analyses of graph algorithms. The ErdösRényi distribution *G(n, p)*, produces a random graph by including every possible edge in the graph independently with probability *p*. While the average degree of a graph chosen from *G*(*n*, 6/(*n* 1)) is approximately six such a graph will be very different from the graph of a triangulation of a point set in two dimensions, which will also have average degree approximately six.

In fact, random objects such as random graphs and random matrices have special properties with exponentially high probability, and these special properties might dominate the average-case analysis. Edelman^{14} writes of random matrices:

What is a mistake is to psychologically link a random matrix with the intuitive notion of a "typical" matrix or the vague concept of "any old matrix."

In contrast, we argue that "random matrices" areveryspecial matrices.

** Smoothed Analysis: A Step Toward Modeling Real Data.** Because of the intrinsic difficulty in defining practical distributions, we consider an alternative approach to modeling real data. The basic idea is to identify typical properties of practical data, define an input model that captures these properties, and then rigorously analyze the performance of algorithms assuming their inputs have these properties.

Smoothed analysis is a step in this direction. It is motivated by the observation that practical data is often subject to some small degree of random noise. For example,

- In industrial optimization and economic prediction, the input parameters could be obtained by physical measurements, and measurements usually have some of low magnitude uncertainty.
- In the social sciences, data often comes from surveys in which subjects provide integer scores in a small range (say between 1 and 5) and select their scores with some arbitrariness.
- Even in applications where inputs are discrete, there might be randomness in the formation of inputs. For instance, the network structure of the Internet may very well be governed by some "blueprints" of the government and industrial giants, but it is still "perturbed" by the involvements of smaller Internet service providers.

In these examples, the inputs usually are neither completely random nor completely arbitrary. At a high level, each input is generated from a two-stage model: In the first stage, an instance is generated and in the second stage, the instance from the first stage is slightly perturbed. The perturbed instance is the input to the algorithm.

In smoothed analysis, we assume that an input to an algorithm is subject to a slight random perturbation. The *smoothed measure* of an algorithm on an input instance is its expected performance over the perturbations of that instance. We define the *smoothed complexity* of an algorithm to be the maximum smoothed measure over input instances.

For concreteness, consider the case , which is a common input domain in computational geometry, scientific computing, and optimization. For these continuous inputs and applications, the family of Gaussian distributions provides a natural model of noise or perturbation.

Recall that a univariate Gaussian distribution with mean 0 and standard deviation has density

The standard deviation measures the magnitude of the perturbation. A *Gaussian random vector* of variance ^{2} centered at the origin in is a vector in which each entry is an independent Gaussian random variable of standard deviation and mean 0. For a vector , a -Gaussian perturbation of is a random vector **x** = + **g**, where **g** is a Gaussian random vector of variance ^{2}. The standard deviation of the perturbation we apply should be related to the norm of the vector it perturbs. For the purposes of this article, we relate the two by restricting the unperturbed inputs to lie in [1, 1]^{n}. Other reasonable approaches are taken elsewhere.

DEFINITION 1. (SMOOTHED COMPLEXITY). *Suppose A is an algorithm with* . *Then, the smoothed complexity of A with -Gaussian perturbations is given by*

*where g* is a Gaussian random vector of variance

In this definition, the "original" input is perturbed to obtain the input + **g**, which is then fed to the algorithm. For each original input, this measures the expected running time of algorithm *A* on random perturbations of that input. The maximum out front tells us to measure the smoothed analysis by the expectation under the worst possible original input.

The smoothed complexity of an algorithm measures the performance of the algorithm both in terms of the input size *n* and in terms of the magnitude of the perturbation. By varying between zero and infinity, one can use smoothed analysis to interpolate between worst-case and average-case analysis. When = 0, one recovers the ordinary worst-case analysis. As grows large, the random perturbation **g** dominates the original , and one obtains an average-case analysis. We are most interested in the situation in which is small relative to ||||, in which case + **g** may be interpreted as a slight perturbation of . The dependence on the magnitude is essential and much of the work in smoothed analysis demonstrates that noise often makes a problem easier to solve.

DEFINITION 2. *A has* **polynomial smoothed complexity** *if there exist positive constants n*_{0}, _{0}, *c, k*_{1}, *and k*_{2} *such that for all n* *n*_{0} *and* 0 _{0},

From Markov's inequality, we know that if an algorithm *A* has smoothed complexity *T*(*n*, ), then

Thus, if *A* has polynomial smoothed complexity, then for any , with probability at least (1 ), *A* can solve a random perturbation of in time polynomial in *n*, 1/, and 1/. However, the probabilistic upper bound given in (3) does not necessarily imply that the smoothed complexity of *A* is *O*(*T*(*n*,)). Blum and Dunagan^{7} and subsequently Beier and Vöcking^{6} introduced a relaxation of polynomial smoothed complexity.

DEFINITION 3. *A has* **probably polynomial smoothed complexity** *if there exist constants n*_{0}, _{0}, *c, and* , *such that for all n* *n*_{0} *and* 0 _{0},

They show that some algorithms have probably polynomial smoothed complexity, in spite of the fact that their smoothed complexity according to Definition 1 is unbounded.

In this section, we give a few examples of smoothed analysis. We organize them in five categories: mathematical programming, machine learning, numerical analysis, discrete mathematics, and combinatorial optimization. For each example, we will give the definition of the problem, state the worst-case complexity, explain the perturbation model, and state the smoothed complexity under the perturbation model.

** Mathematical Programming.** The typical problem in mathematical programming is the optimization of an objective function subject to a set of constraints. Because of its importance to economics, management science, industry, and military planning, many optimization algorithms and heuristics have been developed, implemented and applied to practical problems. Thus, this field provides a great collection of algorithms for smoothed analysis.

*Linear Programming.* Linear programming is the most fundamental optimization problem. A typical linear program is given in Equation 1. The most commonly used linear programming algorithms are the simplex algorithm^{12} and the interior-point algorithms.

The simplex algorithm, first developed by Dantzig in 1951,^{12} is a family of iterative algorithms. Most of them are two-phase algorithms: Phase I determines whether a given linear program is infeasible, unbounded in the objective direction, or feasible with a bounded solution, in which case, a vertex **v**_{0} of the feasible region is also computed. Phase II is iterative: in the *i*th iteration, the algorithm finds a neighboring vertex **v**_{i} of **v**_{i}_{1} with better objective value or terminates by returning **v**_{i}_{1} when no such neighboring vertex exists. The simplex algorithms differ in their pivot rules, which determine which vertex **v**_{i} to choose when there are multiple choices. Several pivot rules have been proposed; however, almost all existing pivot rules are known to have exponential worst-case complexity.^{25}

Spielman and Teng^{36} considered the smoothed complexity of the simplex algorithm with the shadow-vertex pivot rule, developed by Gass and Saaty.^{18} They used Gaussian perturbations to model noise in the input data and proved that the smoothed complexity of this algorithm is polynomial. Vershynin^{38} improved their result to obtain a smoothed complexity of

See Blum and Dunagan, and Dunagan et al.^{7,13} for smoothed analyses of other linear programming algorithms such as the interior-point algorithms and the perceptron algorithm.

*Quasi-Concave Minimization.* Another fundamental optimization problem is quasi-concave minimization. Recall that a function *f*: is quasi-concave if all of its upper level sets *L*_{} = {*x*| *f* (*x*) } are convex. In quasi-concave minimization, one is asked to find the minimum of a quasi-concave function subject to a set of linear constraints. Even when restricted to concave quadratic functions over the hypercube, concave minimization is NP-hard.

In applications such as stochastic and multiobjective optimization, one often deals with data from low-dimensional subspaces. In other words, one needs to solve a quasi-concave minimization problem with a low-rank quasi-concave function.^{23} Recall that a function *f* : has rank *k* if it can be written in the form

for a function *g*: and linearly independent vectors *a*_{1}, *a*_{2}, ..., *a*_{k}.

Kelner and Nikolova^{23} proved that, under some mild assumptions on the feasible convex region, if *k* is a constant then the smoothed complexity of quasi-concave minimization is polynomial when *f* is perturbed by noise. Key to their analysis is a smoothed bound on the size of the *k*-dimensional shadow of the high-dimensional polytope that defines the feasible convex region. Their result is a nontrivial extension of the analysis of two-dimensional shadows of Kelner and Spielman, and Spielman and Teng.^{24,36}

** Machine Learning.** Machine learning provides many natural problems for smoothed analysis. The field has many heuristics that work in practice, but not in the worst case, and the data defining most machine learning problems is inherently noisy.

*K-means.* One of the fundamental problems in machine learning is that of *k*-means clustering: the partitioning of a set of *d*-dimensional vectors *Q* = {*q*_{1}, ..., *q*_{n}} into *k* clusters {*Q*_{1}, ..., *Q*_{k}} so that the intracluster variance

is minimized, where is the centroid of *Q*_{i}.

One of the most widely used clustering algorithms is Lloyd's algorithm (*IEEE Transaction on Information Theory*, 1982). It first chooses an arbitrary set of *k* centers and then uses the Voronoi diagram of these centers to partition *Q* into *k* clusters. It then repeats the following process until it stabilizes: use the centroids of the current clusters as the new centers, and then repartition *Q* accordingly.

Two important questions about Lloyd's algorithm are how many iterations its takes to converge, and how close to optimal is the solution it finds? Smoothed analysis has been used to address the first question. Arthur and Vassilvitskii proved that in the worst case, Lloyd's algorithm requires 2^{(n)} iterations to converge,^{3} but that it has smoothed complexity polynomial in *n*^{k} and ^{1}.^{4}Manthey and Röglin^{28} recently reduced this bound to polynomial in *n*^{k} and ^{1}.

*Perceptrons, Margins, and Support Vector Machines.* Blum and Dunagan's analysis of the perceptron algorithm^{7} for linear programming implicitly contains results of interest in machine learning. The ordinary perceptron algorithm solves a fundamental problem: given a collection of points *x*_{1}, ..., *x*_{n} and labels *b*_{1}, ..., *b*_{n} {±1}^{n}, find a hyperplane separating the positively labeled examples from the negatively labeled ones, or determine that no such plane exists. Under a smoothed model in which the points *x*_{1}, ..., *x*_{n} are subject to a -Gaussian perturbation, Blum and Dunagan show that the perceptron algorithm has probably polynomial smoothed complexity, with exponent = 1. Their proof follows from a demonstration that if the positive points can be separated from the negative points, then they can probably be separated by a large margin. That is, there probably exists a plane separating the points for which no point is too close to that separating plane.

It is known that the perceptron algorithm converges quickly in this case. Moreover, this margin is exactly what is maximized by support vector machines.

*PAC Learning.* Probably approximately correct learning (PAC learning) is a framework in machine learning introduced by Valiant in which a learner is provided with a polynomial number of labeled examples from a given distribution, and must produce a classifier that is usually correct with reasonably high probability. In standard PAC learning, the distribution from which the examples are drawn is fixed. Recently, Kalai and Teng^{21} applied smoothed analysis to this problem by perturbing the input distribution. They prove that polynomial sized decision trees are PAC-learnable in polynomial time under perturbed product distributions. In constrast, under the uniform distribution even super-constant size decision trees are not known to be PAC-learnable in polynomial time.

** Numerical Analysis.** One of the foci of numerical analysis is the determination of how much precision is required by numerical methods. For example, consider the most fundamental problem in computational sciencethat of solving systems of linear equations. Because of the round-off errors in computation, it is crucial to know how many bits of precision a linear solver should maintain so that its solution is meaningful.

For example, Wilkinson (*JACM* 1961) demonstrated a family of linear systems^{c} of *n* variables and {0, 1, 1} coefficients for which Gaussian elimination with partial pivotingthe most popular variant in practicerequires *n*-bits of precision.

*Precision Requirements of Gaussian Elimination.* In practice one almost always obtains accurate answers using much less precision. High-precision solvers are rarely used or needed. For example, Matlab uses 64 bits.

Building on the smoothed analysis of condition numbers (discussed below), Sankar et al.^{33,34} proved that it is sufficient to use *O*(log^{2}(*n/*)) bits of precision to run Gaussian elimination with partial pivoting when the matrices of the linear systems are subject to -Gaussian perturbations.

*The Condition Number.* The smoothed analysis of the condition number of a matrix is a key step toward understanding numerical precision required in practice. For a square matrix **A**, its condition number (**A**) is given by (**A**) = ||**A**||_{2}||**A**^{1}||_{2} where ||**A**||_{2} = max_{x} ||**Ax**||_{2}/||**x**||_{2}. The condition number of **A** measures how much the solution to a system **Ax** = **b** changes as one makes slight changes to **A** and **b:** If one solves the linear system using fewer than log((**A**)) bits of precision, then one is likely to obtain a result far from a solution.

The quantity 1/||**A**^{1}||_{2} = min_{x}||**Ax**||2/||**x**||_{2} is known as the smallest singular value of **A**. Sankar et al.^{34} proved the following statement: For any square matrix in satisfying ||||_{2} n, and for any *x* > 1,

where **A** is a -Gaussian perturbation of **A**. Wschebor^{40} improved this bound to show that for 1 and ||||_{2} 1,

See Bürgisser et al. and Dunagan et al.^{10,13} for smoothed analysis of the condition numbers of other problems.

** Discrete Mathematics.** For problems in discrete mathematics, it is more natural to use Boolean perturbations: Let = (

Believing that -perturbations of Boolean matrices should behave like Gaussian perturbations of real matrices, Spielman and Teng^{35} made the following conjecture:

Let **A** be an *n* by *n* matrix of independently and uniformly chosen ±1 entries. Then

for some constant .

Statements close to this conjecture and its generalizations were recently proved by Vu and Tao^{39} and Rudelson and Vershynin.^{32}

The -Boolean perturbation of a graph can be viewed as a smoothed extension of the classic ErdösRényi random graph model. The -perturbation of a graph , which we denote by *G*_{}(*n*, ), is a distribution of random graphs in which every edge is removed with probability and every nonedge is included with probability . Clearly for *p* [0, 1], *G*(*n*, *p*) = *G*_{}(*n*, *p*), i.e., the *p*-Boolean perturbation of the empty graph. One can define a smoothed extension of other random graph models. For example, for any *m* and *G* = (*V, E*), Bohman et al.^{9} define *G*(, *m*) to be the distribution of the random graphs (*V, E* *T*) where *T* is a set of *m* edges chosen uniformly at random from the complement of *E*, i.e., chosen from = {(*i, j*) *E*}.

A popular subject of study in the traditional ErdösRényi model is the phenomenon of phase transition: for many properties such as being connected or being Hamiltonian, there is a critical *p* below which a graph is unlikely to have each property and above which it probably does have the property. Related phase transitions have also been found in the smoothed ErdösRényi models *G*_{}(*n*, ).^{17, 26}

Smoothed analysis based on Boolean perturbations can be applied to other discrete problems. For example, Feige^{16} used the following smoothed model for 3CNF formulas. First, an adversary picks an arbitrary formula with *n* variables and *m* clauses. Then, the formula is perturbed at random by flipping the polarity of each occurrence of each variable independently with probability . Feige gave a randomized polynomial-time refutation algorithm for this problem.

** Combinatorial Optimization.** Beier and Vöcking

where **A** is an *m* × *n* real matrix, **b** , and .

Recall that `ZPP`

denotes the class of decision problems solvable by a randomized algorithm that always returns the correct answer, and whose expected running time (on every input) is polynomial. Beier, Röglin, and Vöcking^{6,31} proved the following statement: For any constant *c*, let be a class of integer linear programs of form 5 with |*D*| = *O*(*n*^{c}). Then, has an algorithm of probably polynomial smoothed complexity if and only if _{u} `ZPP`

, where _{u} is the "unary" representation of . Consequently, the 0/1-knapsack problem, the constrained shortest path problem, the constrained minimum spanning tree problem, and the constrained minimum weighted matching problem have probably polynomial smoothed complexity.

Smoothed analysis has been applied to several other optimization problems such as local search and TSP,^{15} scheduling,^{5} motion planning,^{11} superstring approximation,^{27} multiobjective optimization,^{31} string alignment,^{2} and multidimensional packing.^{22}

** A Theme in Smoothed Analyses.** One idea is present in many of these smoothed analyses: perturbed inputs are rarely ill-conditioned. Informally speaking, the condition number of a problem measures how much its answer can be made to change by slight modifications of its input. Many numerical algorithms are known to run faster when their inputs are well conditioned. Some smoothed analyses, such as that of Blum and Dunagan,

For many problems, the condition number of an input is approximately the inverse of the distance of that input to the set of degenerate or ill-posed problems. As these degenerate sets typically have measure zero, it is intuitively reasonable that a perturbed instance should be unlikely to be too close to a degenerate one.

** Other Performance Measures.** Although we normally evaluate the performance of an algorithm by its running time, other performance parameters are often important. These performance parameters include the amount of space required, the number of bits of precision required to achieve a given output accuracy, the number of cache misses, the error probability of a decision algorithm, the number of random bits needed in a randomized algorithm, the number of calls to a particular subroutine, and the number of examples needed in a learning algorithm. The quality of an approximation algorithm could be its approximation ratio; the quality of an online algorithm could be its competitive ratio; and the parameter of a game could be its price of anarchy or the rate of convergence of its best-response dynamics. We anticipate future results on the smoothed analysis of these performance measures.

** Precursors to Smoothed Complexity.** Several previous probabilistic models have also combined features of worst-case and average-case analyses.

Haimovich^{19} and Adler^{1} considered the following probabilistic analysis: Given a linear program = (**A, b, c**) of form (1), they defined the *expected complexity* of to be the expected complexity of the simplex algorithm when the inequality sign of each constraint is uniformly flipped. They proved that the expected complexity of the worst possible is polynomial.

Blum and Spencer^{8} studied the design of polynomial-time algorithms for the semi-random model, which combines the features of the semi-random source with the random graph model that has a "planted solution." This model can be illustrated with the *k*-`Coloring Problem`

: An adversary plants a solution by partitioning the set *V* of *n* vertices into *k* subsets *V*_{1},...,*V*_{k}. Let

be the set of potential inter-subset edges. A graph is then constructed by the following semi-random process that perturbs the decisions of the adversary: In a sequential order, the adversary decides whether to include each edge of *F* in the graph, and then a random process reverses the decision with probability . Note that every graph generated by this semi-random process has the planted coloring: *c*(*v*) = *i* for all *v* *V*_{i}, as both the adversary and the random process preserve this solution by only considering edges from *F*.

As with the smoothed model, one can work with the semi-random model by varying between 0 and 1 to interpolate between worst-case and average-case complexity for *k*-coloring.

** Algorithm Design and Analysis for Special Families of Inputs.** Probabilistic approaches are not the only means of characterizing practical inputs. Much work has been spent on designing and analyzing inputs that satisfy certain deterministic but practical input conditions. We mention a few examples that excite us.

In parallel scientific computing, one may often assume that the input graph is a well-shaped finite element mesh. In VLSI layout, one often only considers graphs that are planar or nearly planar. In geometric modeling, one may assume that there is an upper bound on the ratio among the distances between points. In Web analysis, one may assume that the input graph satisfies some powerlaw degree distribution or some small-world properties. When analyzing hash functions, one may assume that the data being hashed has some non-negligible entropy.^{30}

** Limits of Smoothed Analysis.** The goal of smoothed analysis is to explain why some algorithms have much better performance in practice than predicted by the traditional worst-case analysis. However, for many problems, there may be better explanations.

For example, the worst-case complexity and the smoothed complexity of the problem of computing a market equilibrium are essentially the same.^{20} So far, no polynomial-time pricing algorithm is known for general markets. On the other hand, pricing seems to be a practically solvable problem, as Kamal Jain put it "If a Turing machine can't compute then an economic system can't compute either."

A key step to understanding the behaviors of algorithms in practice is the construction of analyzable models that are able to capture some essential aspects of practical input instances. For practical inputs, there may often be multiple parameters that govern the process of their formation.

One way to strengthen the smoothed analysis framework is to improve the model of the formation of input instances. For example, if the input instances to an algorithm *A* come from the output of another algorithm *B*, then algorithm *B*, together with a model of *B*'s input instances, provide a description of *A*'s inputs. For example, in finite-element calculations, the inputs to the linear solver *A* are stiffness matrices that are produced by a meshing algorithm *B*. The meshing algorithm *B*, which could be a randomized algorithm, generates a stiffness matrix from a geometric domain and a partial differential equation *F*. So, the distribution of the stiffness matrices input to algorithm *A* is determined by the distribution of the geometric domains and the set *F* of partial differential equations, and the randomness in algorithm *B*. If, for example, is the design of an advanced rocket from a set of "blueprints" and *F* is from a set of PDEs describing physical parameters such as pressure, speed, and temperature, and is generated by a perturbation model of the blueprints, then one may further measure the performance of *A* by the smoothed value:

where indicates that is obtained from a perturbation of and *X* *B*(, *F*) indicates that *X* is the output of the randomized algorithm *B*.

A simpler way to strengthen smoothed analysis is to restrict the family of perturbations considered. For example, one could employ *zero-preserving perturbations*, which only apply to nonzero entries. Or, one could use *relative perturbations*, which perturb every real number individually by a small multiplicative factor.

** Algorithm Design Based on Perturbations and Smoothed Analysis.** Finally, we hope insights gained from smoothed analysis will lead to new ideas in algorithm design. On a theoretical front, Kelner and Spielman

Our smoothed analysis of Gaussian elimination suggests a more stable solver for linear systems: When given a linear system **Ax** = **b**, we first use the standard Gaussian elimination with partial pivoting algorithm to solve **Ax** = **b**. Suppose **x*** is the solution computed. If ||**b** **Ax***|| is small enough, then we simply return **x***. Otherwise, we can determine a parameter and generate a new linear system (**A** + **G**)**y** = **b**, where **G** is a Gaussian matrix with mean **0** and variance 1. Instead of solving **Ax** = **b**, we solve a perturbed linear system (**A** + **G**)**y** = **b**. It follows from standard analysis that if is sufficiently smaller than (**A**), then the solution to the perturbed linear system is a good approximation to the original one. One could use practical experience or binary search to set .

The new algorithm has the property that its success depends only on the machine precision and the condition number of **A**, while the original algorithm may fail due to large growth factors. For example, the following is a fragment of Matlab code that first solves a linear system whose matrix is the 70 × 70 matrix Wilkinson designed to trip up partial pivoting, using the Matlab linear solver. We then perturb the system, and apply the Matlab solver again.

Note that while the Matlab linear solver fails to find a good solution to the linear system, our new perturbation-based algorithm finds a good solution. While there are standard algorithms for solving linear equations that do not have the poor worst-case performance of partial pivoting, they are rarely used as they are less efficient.

For more examples of algorithm design inspired by smoothed analysis and perturbation theory, see Teng.^{37}

We would like to thank Alan Edelman for suggesting the name "Smoothed Analysis" and thank Heiko Röglin and Don Knuth for helpful comments on this writing.

Due to *Communications* space constrainsts, we have had to restrict our bibliography to 40 references. We apologize to those whose work we have been forced not to reference.

1. Adler, I. The expected number of pivots needed to solve parametric linear programs and the efficiency of the self-dual simplex method. Technical report, University of California at Berkeley (May 1983).

2. Andoni, A. and Krauthgamer, R. The smoothed complexity of edit distance. In *Proceedings of ICALP*, volume 5125 of *Lecture Notes in Computer Science* (Springer, 2008) 357369.

3. Arthur, D. and Vassilvitskii, S. How slow is the k-means method? In *SOCG '06, the 22nd Annual ACM Symposium on Computational Geometry* (2006) 144153.

4. Arthur, D. and Vassilvitskii, S. Worst-case and smoothed analysis of the ICP algorithm, with an application to the k-means method. In *FOCS '06, the 47th Annual IEEE Symposium on Foundations of Computer Science* (2006) 153164.

5. Becchetti, L., Leonardi, S., Marchetti-Spaccamela, A., Schfer, G., and Vredeveld, T. Average-case and smoothed competitive analysis of the multilevel feedback algorithm. *Math. Oper. Res. 31*, 1 (2006), 85108.

6. Beier, R. and Vöcking, B. Typical properties of winners and losers in discrete optimization. In *STOC '04: the 36th Annual ACM Symposium on Theory of Computing* (2004), 343352.

7. Blum, A. and Dunagan, J. Smoothed analysis of the perceptron algorithm for linear programming. In *SODA '02* (2002), 905914.

8. Blum, A. and Spencer, J. Coloring random and semi-random k-colorable graphs. *J. Algorithms 19*, 2 (1995), 204234.

9. Bohman, T., Frieze, A. and Martin, R. How many random edges make a dense graph hamiltonian? *Random Struct. Algorithms 22*, 1 (2003), 3342.

10. Bürgisser, P., Cucker, F., and Lotz, M. Smoothed analysis of complex conic condition numbers. *J. de Mathématiques Pures Appliqués 86* 4 (2006), 293309.

11. Damerow, V., Meyer auf der Heide, F., Räcke, H., Scheideler, C., and Sohler, C. Smoothed motion complexity. In *Proceedings of the 11th Annual European Symposium on Algorithms* (2003), 161171.

12. Dantzig, G.B. Maximization of linear function of variables subject to linear inequalities. *Activity Analysis of Production and Allocation.* T.C. Koopmans, Ed. 1951, 339347.

13. Dunagan, J., Spielman, D.A., and Teng, S.-H. Smoothed analysis of condition numbers and complexity implications for linear programming. *Mathematical Programming, Series A*, 2009. To appear. Preliminary version available at http://arxiv.org/abs/cs/0302011v2.

14. Edelman, A. Eigenvalue roulette and random test matrices. *Linear Algebra for Large Scale and Real-Time Applications.* Marc S. Moonen, Gene H. Golub, and Bart L. R. De Moor, eds. NATO ASI Series, 1992, 365368.

15. Englert, M., Röglin, H., and Vöcking, B. Worst case and probabilistic analysis of the 2-opt algorithm for the TSP: extended abstract. In *SODA '07: The 18th Annual ACM-SIAM Symposium on Discrete Algorithms* (2007), 12951304.

16. Feige, U. Refuting smoothed 3CNF formulas. In *The 48th Annual IEEE Symposium on Foundations of Computer Science* (2007), 407417.

17. Flaxman, A. and Frieze, A.M. The diameter of randomly perturbed digraphs and some applications. In *APPROX-RANDOM* (2004), 345356.

18. Gass, S. and Saaty, T. The computational algorithm for the parametric objective function. *Naval Res. Logist. Quart. 2*, (1955), 3945.

19. Haimovich, M. The simplex algorithm is very good!: On the expected number of pivot steps and related properties of random linear programs. Technical report, Columbia University (April 1983).

20. Huang, L.-S. and Teng, S.-H. On the approximation and smoothed complexity of Leontief market equilibria. In *Frontiers of Algorithms Workshop* (2007), 96107.

21. Kalai, A. and Teng, S.-H. Decision trees are PAC - learnable from most product distributions: a smoothed analysis. *ArXiv e-prints* (December 2008).

22. Karger, D. and Onak, K. Polynomial approximation schemes for smoothed and random instances of multidimensional packing problems. In *SODA '07: the 18th Annual ACM-SIAM Symposium on Discrete Algorithms* (2007), 12071216.

23. Kelner, J.A. and Nikolova, E. On the hardness and smoothed complexity of quasi-concave minimization. In *The 48th Annual IEEE Symposium on Foundations of Computer Science* (2007), 472482.

24. Kelner, J.A. and Spielman, D.A. A randomized polynomial-time simplex algorithm for linear programming. In *The 38th Annual ACM Symposium on Theory of Computing* (2006), 5160.

25. Klee, V. and Minty, G.J. How good is the simplex algorithm? *Inequalities III.* O. Shisha, ed. Academic Press, 1972, 159175.

26. Krivelevich, M., Sudakov, B. and Tetali, P. On smoothed analysis in dense graphs and formulas. *Random Struct. Algorithms 29* (2005), 180193.

27. Ma, B. Why greed works for shortest common superstring problem. In *CPM '08: Proceedings of the 19th Annual Symposium on Combinatorial Pattern Matching* (2008), 244254.

28. Manthey, B. and Röglin, H. Improved smoothed analysis of the k-means method. In *SODA '09* (2009).

29. Mehlhorn, K. and Näher, S. *The LEDA Platform of Combinatorial and Geometric Computing.* Cambridge University Press, New York, 1999.

30. Mitzenmacher, M. and Vadhan, S. Why simple hash functions work: exploiting the entropy in a data stream. In *SODA '08: Proceedings of the Nineteenth Annual ACM-SIAM Symposium on Discrete Algorithms* (2008), 746755.

31. Röglin, H. and Vöcking, B. Smoothed analysis of integer programming. *Proceedings of the 11th International Conference on Integer Programming and Combinatorial Optimization.* M. Junger and V. Kaibel, eds. Volume 3509 of Lecture Notes in Computer Science, Springer, 2005, 276290.

32. Rudelson, M. and Vershynin, R. The littlewood-offord problem and invertibility of random matrices. *Adv. Math. 218*, (June 2008), 600633.

33. Sankar, A. Smoothed analysis of Gaussian elimination. Ph.D. Thesis, MIT, 2004.

34. Sankar, A., Spielman, D.A., and Teng, S.-H. Smoothed analysis of the condition numbers and growth factors of matrices. *SIAM J. Matrix Anal. Appl. 28*, 2 (2006), 446476.

35. Spielman, D.A. and Teng, S.-H. Smoothed analysis of algorithms. In *Proceedings of the International Congress of Mathematicians* (2002), 597606.

36. Spielman, D.A. and Teng, S.-H. Smoothed analysis of algorithms: Why the simplex algorithm usually takes polynomial time. *J. ACM 51*, 3 (20040, 385463.

37. Teng, S.-H. Algorithm design and analysis with perburbations. In *Fourth International Congress of Chinese Mathematicans* (2007).

38. Vershynin, R. Beyond Hirsch conjecture: Walks on random polytopes and smoothed complexity of the simplex method. In *Proceedings of the 47th Annual IEEE Symposium on Foundations of Computer Science* (2006), 133142.

39. Vu, V.H. and Tao, T. The condition number of a randomly perturbed matrix. In *STOC '07: the 39th Annual ACM Symposium on Theory of Computing* (2007), 248255.

40. Wschebor, M. Smoothed analysis of (**a**). *J. Complexity 20*, 1 (February 2004), 97107.

a. In presidential elections in the United States, each of the 50 states and the District of Columbia is allocated a number of electors. All but the states of Maine and Nebraska use a winner-take-all system, with the candidate winning the majority votes in each state being awarded all of that states electors. The winner of the election is the candidate who is awarded the most electors.

b. Available at http://sigact.acm.org/

c. See the second line of the Matlab code at the end of the Discussion section for an example.

DOI: http://doi.acm.org/10.1145/1562764.1562785

**©2009 ACM 0001-0782/09/1000 $10.00**

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee.

The Digital Library is published by the Association for Computing Machinery. Copyright © 2009 ACM, Inc.

No entries found