Technical Perspective: Solve for x

Ax = b. Solve for x. The problem of simultaneously solving linear equations in many variables is one of the oldest in mathematics. The first recognizable algorithm for the problem appears in the 179 C.E. book Nine Chapters on the Mathematical Art. MATLAB considers it a sufficiently important operation that it makes solving for x as easy as typing Ab. It is the computational bottleneck in applications in scientific computing, optimization, statistics, and cryptography. Fast algorithms have been designed for solving the systems that arise in special applications. But its general complexity remains a mystery.

The conceptually simplest way to solve for x is to compute the inverse of A, and then multiply this by b. Even this seemingly simple approach yields surprises. The natural, elimination-based algorithm for computing the inverse of an n-by-n matrix requires around n³ operations. Strassen’s algorithm, which improves this by computing the inverse in $O (n^{\log_{2}^{7}})$ operations, is one of the most striking examples of algorithm design. Strassen used similar ideas to show how to multiply two n-by-n matrices in $O (n^{\log_{2}^{7}})$ time, and these inspired Bunch and Hopcroft to show that the asymptotic complexity of matrix multiplication and inversion are the same. The smallest number $ω$ such that one can multiply two square n-dimensional matrices in time $O (n^{ω})$ is called the matrix multiplication exponent. It is now known that $ω \leq 2.373$ , and many conjecture that $ω$ approaches its natural lower bound of 2.

Once we know the inverse of A, we can quickly solve any system of equations in A. If we just want to solve one system, can we do it faster than computing the inverse? Iterative solvers do it while only performing $O (n^{2})$ arithmetic operations if the system is sparse. We say that the system is sparse if, on average, each variable appears in a constant number of equations. That is, if the matrix A has $O (n)$ nonzero entries.

The bit-complexity of these iterative algorithms depends on the types of numbers that appear in A and b. When the entries are integers modulo a prime, Wiedemann’s algorithm can be used to solve for x while multiplying n vectors by A and performing a small amount of additional computation. When A is sparse each multiplication by A takes time $O (n)$ , and the algorithm takes time $O (n^{2})$ .

The bit-complexity for real number inputs is more complicated. The Lanczos method or conjugate gradient require at most n multiplications of vectors by A. But multiplication of real numbers can increase their bit- length so much that the precision required for these algorithms can become impractically large. In fact, the number of bits required to write the exact solution x can be much larger than the number required to write A or b.

Our outlook improves if we ask for an $\tilde{x}$ that is close to the solution, instead of the exact solution. Iterative algorithms produce closer solutions the longer they run. The number of iterations they need is often upper bounded by a nice function of $κ$ , the condition number of A. The condition number bounds how much x can change when small changes are made to A and b. It also bounds how close $\tilde{A x}$ is to b in terms of how close $\tilde{x}$ is to the exact solution. The conjugate gradient can be shown to produce $ϵ$ – approximate solutions after $O (\sqrt{k} \log (1 / ϵ))$ multiplications of vectors by A. This is fast if A is sparse, and the condition number is not too big. But it does not provide a general solution.

Modulo a prime, there is no good notion of an approximate solution. This made the problem over the reals seem simpler until a 2006 breakthrough of Eberly, Giesbrecht, Giorgi, Storjohann, and Villard, who leverage iterative methods in an algorithm that inverts sparse matrices modulo a prime in time approximately $O (n^{2.27})$ .

Their work inspired the following paper by Peng and Vempala, who present the first algorithm that approximately solves sparse systems over the reals with large condition number in time less than $O (n^{ω})$ . Their algorithm takes time approximately $O (n^{2.33} \log^{2} (κ / ϵ))$ . They use random perturbations to circumvent precision issues, and better analysis of these perturbations could improve their running time. They have pushed through a barrier that has stood for decades. How far will their momentum carry us? Will we find faster algorithms for inverting matrices with real entries? Does $ω = 2$ ? How quickly can we solve for x?

Technical Perspective: Solve for x

DOI

July 2024 Issue

Join the Discussion (0)

Become a Member or Sign In to Post a Comment

The Latest from CACM

Shape the Future of Computing

Communications of the ACM (CACM) is now a fully Open Access publication.

Technical Perspective: Solve for x

DOI

July 2024 Issue

Related Reading

Join the Discussion (0)

Become a Member or Sign In to Post a Comment

Shape the Future of Computing

Communications of the ACM (CACM) is now a fully Open Access publication.