Better Algorithms through Faster Math

bursts of light from circuitry, illustration

Developing faster algorithms is an important but elusive goal for data scientists. The ability to accelerate complex computing tasks and reduce latency has far-reaching ramifications in areas such as natural language processing, video streaming, autonomous robotics, gaming, and extended reality.

Yet for all the hype surrounding computer algorithms and the increasingly sophisticated ways they operate, a basic fact stands out: these algorithms are typically built atop matrix multiplication, a basic type of linear algebra. The underlying mathematical framework has not changed a great deal since the inception of computing—and finding more efficient formulas has proved elusive.

It is an issue attracting growing attention—particularly as machine learning (ML), deep learning (DL), artificial intelligence (AI), and machine automation advance into the mainstream. Milliseconds can make an enormous difference in terms of how well an algorithm operates, the performance it produces, and how many computing cycles it consumes.

So, not surprisingly, the search for new algorithms that could reduce multiplication is at the center of data science. Although researchers have explored this issue for many years and experimented with different techniques, in October 2022 a team from Alphabet Inc.’s U.K.-based subsidiary DeepMind reported it had found a way to uncover better algorithms through its AlphaTensor reinforcement learning framework.

Using ML, DeepMind found a way to discover novel fast matrix multiplication algorithms, says Alhussein Fawzi, staff research scientist at DeepMind and lead author of an academic paper that appeared in Nature.^a While the results are more theoretical than practical at this point, the discovery represents a breakthrough. “This approach could eliminate many bottlenecks,” states Tammy Kolda, an independent consultant for MathSci.AI and a distinguished visiting professor at Northwestern University.

The research community has taken notice. While many questions remain—particularly surrounding the black-box nature of the research and how much computational power and financial resources used DeepMind—it is clear faster algorithms would have a profound impact on wide-ranging areas of AI, including climate change models, speech recognition systems, autonomous vehicles, neural networks, and graphics.

“This extends explosive progress achieved recently in reinforcement learning,” says Victor Y. Pan, Distinguished Professor in the departments of Mathematics and Computer Science at City University of New York (CUNY) and a pioneer in the field of fast feasible matrix multiplication algorithms.

Making Algorithms Add Up

Although it is tempting to view algorithms through the filter of modern data science and computing, mathematicians have used matrix multiplication methods for hundreds of years to address technical challenges in fields as diverse as finance, engineering, chemistry, and physics. Matrix multiplication, while one of the most basic operations in algebra, is also among the most powerful and utilitarian.

Today, algorithms handle a mind-bending array of tasks on various types of computing devices. They recognize speech commands, provide live navigation capabilities, generate computer graphics, deliver ads, and run elaborate simulations, including Digital Twins. In fact, matrix multiplication is so important that computer hardware is specifically optimized to use it. Until 1969, the math consisted of establishing a side-by-side matrix and then multiplying the rows of one with the columns of the other. At that point, German mathematician Volker Strassen found a shortcut that trimmed the number of steps in a two-row and two-column model from 8 to 7.^b

Since then, a point of intrigue for data scientists has been whether more efficient algorithms are possible—particularly because most dense matrix multiplication algorithms use non-Strassen methods. The question is rooted in a basic fact: Every step in a multiplication table results in slower algorithm performance and increased compute cycles. Yet, research in the space has slogged forward. “Over the years, we have periodically seen small and incremental performance improvements—and many of these gains have been strictly theoretical,” says Grey Ballard, an associate professor of computing science at Wake Forest University.

Indeed, the DeepMind group rocked the data science world with its announcement. Building on Alpha-Zero—a Google framework that previously was used to master chess, Go, and other games through an earlier iteration known as AlphaGo—their new ML approach (called AlphaTensor) found there are more efficient ways to multiply grids. AlphaTensor cycles through matrices and generates code that can be used to multiply any matrix. What’s more, researchers can train it to search for efficient algorithms tailored to specific hardware.

The results were impressive. For example, AlphaTensor discovered an algorithm for multiplying four-by-four matrices in 47 steps, down from the 49 steps that are used in the standard Strassen algorithm. DeepMind also found it could optimize algorithms to run up to 20% faster on specific hardware, such as Nvidia V100 GPUs and the Google TPU v2 platform. “This isn’t the same as embedding these algorithms on real world systems, but it was a significant proof of concept,” Fawzi points out.

Into the Matrix

At the outset, AlphaTensor had zero knowledge of the specific problem DeepMind researchers set out to solve—specifically, finding shortcuts for matrix multiplication. Relying on the same general reinforcement learning methods that researchers had previously used to teach AlphaGo chess and other games, the DeepMind team constructed a single-player game and directed AlphaTensor to find highly efficient ways to win.

However, the approach differed from past projects in an important way. It eschewed techniques used to teach older AI models games through alpha-beta pruning techniques and constant human oversight and heuristics. AlphaTensor teaches itself by cycling through scenarios using a reward function. As it succeeds, it updates its policies and values. The net effect was an ability to discover new algorithms.

“An important contribution of AlphaTensor is in the formulation process itself: How can we express a problem that, at first sight, is far removed from the usual settings in machine learning as a reinforcement learning problem,” Fawzi explains. These include factors that are not part of a typical game, including “a huge action space, a highly symmetric problem, and a lack of a large dataset based on expert human demonstrations,” he notes.

Along the way, DeepMind encountered a steep challenge: the sheer size of the search space. AlphaTensor had to sort through the possibility of more algorithms than the entire number of atoms in the universe. In fact, at one point, the system had to sift through upward of 10³³ possible steps within the game. In order to achieve meaningful results, “Machine learning had to inform the search procedure to find the interesting and useful algorithms, which would not be possible to discover with uninformed search,” Fawzi explains.

DeepMind researchers using AlphaTensor found there are more efficient ways to multiply grids.

Researchers used several heuristic techniques to tackle the space problem, Fawzi explains. First, they turned to a sampled tree search, which relied on modified versions of AlphaZero, the underlying model for AlphaTensor. This allowed the group to sample actions that were more likely to produce meaningful outcomes. They also used synthetic data to show the agent techniques it could imitate, and they directed AlphaTensor to solve all matrix sizes and fields simultaneously. “This made it possible to accelerate knowledge transfer within the system,” Fawzi says.

The researchers also reformulated algorithm discovery in numerous equivalent ways, through a technique known as change of basis. “Once the agent finds a good solution in any of the ‘basis,’ this solution can transfer to all other instances of the problem,” Fawzi says. Although AlphaTensor still has a way to go to learn algebraic geometry, representation theory, and other complex mathematical tasks, it already has demonstrated an ability to find the proverbial “needle in a haystack.”

AI by the Numbers

DeepMind’s work has pushed matrix multiplication into new territory—and other DL agents using similar techniques may advance the field further. Ballard, who has conducted research in the same space, says the DeepMind breakthrough slides the algorithmic improvement dial to a more practical level suited for realistic applications. “When you are looking at a neural network with many layers and a large amount of data movement, the bottlenecks start to add up. Any speed increase can result in significant gains in the end result,” he says.

Kolka likens the DeepMind breakthrough to finding a faster way to navigate between two cities without an existing map.

Kolda, who also has studied Tensor Decompensation and other methods to improve algorithm performance—including during a stint at Sandia National Laboratories—likens the DeepMind breakthrough to finding a faster way to navigate between two cities without an existing map. A ML model must explore every road, highway, and byway until it finds the shortest or best route. “The focus isn’t so much on how you got there, it’s that you have found a way to optimize the path.”

Yet the DeepMind methods also raise important questions about the autonomy of AI and the black box approach. For one thing, AI algorithms require less and less engineering and input from humans. “It’s a messy way to get to an algorithm that you can’t actually verify,” Kolda says. For another, “We don’t know how much computational power went into solving this problem and whether other entities have the deep pockets to make this type of research possible—or feasible on any realistic scale,” she adds.

The latter is not an abstract issue, or an obstacle that is easily addressed. In order to achieve significant gains in matrix multiplication, Fawzi admits, researchers would need to study much larger tensors, which would make the search space even larger. Essentially, the needle stays the same, but the haystack grows larger. This challenge may necessitate the use of a hybrid method that combines novel ML approaches that do not incorporate any problem-specific knowledge with more traditional methods built atop mathematical techniques. “This may help the model scale better,” he says.

In fact, Fawzi believes many computational problems can be framed as tensor decompositions. In practical terms, this means it is possible to reuse and recycle many of the techniques the DeepMind team developed with AlphaTensor. Over time, this approach could lead to significant improvements in existing algorithms, but also produce new types of algorithms for improving things such as energy consumption and numerical stability. Says Fawzi, “We believe that the solutions we built for this project will be applicable—or at least provide intuition—for other problems that extend beyond matrix multiplication.”

Further Reading

Fawzi, A. et al.
Discovering Faster Matrix Multiplication Algorithms with Reinforcement Learning. Nature, 610, pages 47–53, October 22, 2022. https://www.nature.com/articles/s41586-022-05172-4

Discovering novel algorithms with AlphaTensor. DeepMind, October 5, 2022. https://www.deepmind.com/blog/discovering-novel-algorithms-with-alphatensor

Karstadt, E. and Schwartz, O.
Matrix Multiplication, a Little Faster. J. Communications Volume 67, Article 1. January 2020, Pages 1–31. https://doi.org/10.1145/3364504

Ballard, G., Demmel, J., Holtz, O., and Schwartz, O.
Communication Costs of Strassen’s Matrix Multiplication. Communications, Volume 57, Issue 2, February 2014, pp 107–114. https://dl.acm.org/doi/10.1145/2556647.2556660

Benson, A.R. and Ballard, G.
A framework for practical parallel fast matrix multiplication. ACM SIGPLAN Notices, Volume 50, Issue 8, August 2015, pp 42–53. https://dl.acm.org/doi/10.1145/2858788.2688513

Pan, V.Y.
Strassen’s Algorithm Is Not Optimal: Trilinear Technique of Aggregating for Fast Matrix Multiplication, Proc. of the 19th Annual IEEE Symposium on Foundations of Computer Science (FOCS’78), 166–176, IEEE Computer Society Press, Long Beach, California, 1978 (Journal version in “New Fast Algorithms for Matrix Operations”, SlAM J. on Computing, 9, 2, 321–342 (1980)).

Footnotes

a. "Discovering faster matrix multiplication algorithms with reinforcement learning"; https://go.nature.com/3KDEWf4

b. "Gaussian elimination is not optimal," https://link.springer.com/article/10.1007/BF02165411