There is Plenty of Room at The Top (of Supercomputing)

Jack Dongarra during his lecture at the Heidelberg Laureate Forum.

Supercomputers are the Olympic champions of scientific computing. Through numerical simulations, they enrich our understanding of the world, be it stars lightyears away in the universe, the Earth's weather and climate, or the functioning of the human body.

For over four decades, Jack Dongarra has been a driving force in the field of high-performance computing. Earlier this year, Dongarra was awarded the 2021 ACM A.M. Turing Award for "his pioneering contributions to numerical algorithms and libraries that enabled high performance computational software to keep pace with exponential hardware improvements for over four decades."

Writer Bennie Mols met with Dongarra during the 9th Heidelberg Laureate Forum in Germany in September to talk about the present and future of high-performance computing. Dongarra, now 72, has been a University Distinguished Professor at the University of Tennessee (U.S.) and a Distinguished Research Staff Member at the U.S. Department of Energy's Oak Ridge National Laboratory since 1989.

Over the decades, what has been your driving force in your scientific research?

My background is in mathematics, especially in numerical linear algebra; all of my work stems from that. For many problems in computational science, such as in physics and chemistry, you need to solve systems of linear equations, so having software that can do that is important. You have to make sure the software runs in parity with the machine architecture, so you can actually get the high performance that the machine is capable of.

What are the most important requirements for software that runs on a supercomputer?

We want the software to be accurate. We want the scientific community to use and understand the software and may be even contribute to improvements. We want the software to perform well, to be portable over to different machines. We want the code to be readable and reliable, and finally, we want the software to enhance the productivity of the person who is using it.

Developing software that meets all these requirements is a non-trivial process. We are talking about millions of lines of code, and roughly every 10 years ,we see some major change in the machine's architecture. That causes a refactoring of the algorithms that we have, and the software that embodies those algorithms. The software follows the hardware, and there is still plenty of room at the top of supercomputing to getting to better-performing machines.

What is a current development in high-performance computing that excites you?

High-performance supercomputers are built on commodity parts, let's say the high-end chips that you and I can also buy, just many more of them. And typically we use some accelerators, in the form of GPUs, on top. We have boards of multiple chips, we put them in a rack, and many of these racks together form a supercomputer. We use commodity parts because it is cheaper, but if you would specially design the chips for doing scientific computations, you would get supercomputers that perform much better, and that is an exciting idea.

Actually, this is exactly what companies like Amazon, Facebook, Google, Microsoft, Tencent, Baidu, and Alibaba are doing; they are making their own chips. They can do this because they have enormous funding. Universities are always limited in funding, and therefore they unfortunately have to do with commodity stuff. This is related to one of my other worries: how do we keep talent in the scientific areas, rather than see them go to work for big companies that pay much better?'

What are other important developments for the future of high-performance computing?

There are a number of important things. It is clear that machine learning is already having an important impact on scientific computing, and this impact will only grow. I see machine learning as a tool that helps to solve the problems that computational scientists want to solve.

This goes together with another important development. Traditionally ,our hardware uses 64-bit floating point operations, so we represent numbers in 64 bits. But you can speed up the computations if you use fewer bits, say 32, 16, or even 8 bits. By speeding up your computation, you lose precision. Yet it looks like AI calculations can often do with fewer bits, 16 or even 8. It is an area of investigation to find out where this plays out well and where it will not.

Another area of investigation is about how you can start with a low-precision computation, get an approximation, and then later use higher-precision computation to refine the outcome.

What about the huge power consumption of supercomputers?

'The best-performing supercomputers nowadays consume 20 or 30 megawatts to achieve an exaflop speed. An exaflop is 10¹⁸floating point operations per second, so a billion times a billion. If each person on Earth would do one calculation per second, it would take more than four years to do what an exascale computer does in one second. Probably in 20 years, we might want to get to the zettaflop scale, that is 10²¹ flops. However, the power consumption could be the limiting factor. You would need a 100- or 200-megawatt machine, and maybe that's too much.

How do you see the role of quantum computing in the future of high-performance computing?

I see quantum computing as something which will help for a limited set of problems, but it's not going to solve things like three-dimensional partial differential equations, for which we use supercomputers a lot, like in climate modeling.

We will have a large box of different types of computational tools. We will have processors and accelerators, we will have tools that help with machine learning, we may well have devices which do neuromorphic computing in the way the brain does it, we will have optical computers, and in addition, we will have quantum computers for a certain niche of problems.

Bennie Mols is a science and technology writer based in Amsterdam, the Netherlands.