Architecture and Hardware BLOG@CACM

Unleashing the Power of Deep Learning

The Massachusetts Institute of Technology's Vivienne Sze on how to take greatest advantage of deep learning systems.

Posted
MIT Associate Professor Vivienne Sze

March 16, 2023 https://bit.ly/3NwNqYl

While it is clear that deep learning (a core technology used in AI-enabled applications) can deliver tangible results, the technology’s applications are still being constrained in several different directions. Many of the limitations in terms of accuracy and ability can be addressed in the coming years as programmers and designers refine their algorithms and pile on more training data.

There is one constraint, though, that seems fundamental to the nature of deep learning, to the extent that many developers almost accept it as the price of doing business: the sheer computation power required to run it.

Back to Top

Tied to the Cloud

Deep learning involves running hundreds of millions or even billions of compute operations (for example, multiplies and additions). The GPT3 model—the foundation for the wildly successful ChatGPT tool—is reported to have used 175 billion parameters, requiring more than 1023 compute operations to train (which translates to millions of dollars) and the finished product requires clusters of powerful, and expensive, processers to run effectively.

While this hunger for processing power is not going to surprise anybody familiar with the technology, the limitations it imposes extend far beyond the need to buy more processors. It also makes it extremely difficult to run deep learning on a portable device—the kind of thing that people are likely to have in their home, bag, or pocket.


The GPT3 model is reported to have used 175 billion parameters, requiring more than 1023 compute operations to train (which translates to millions of dollars).


This need for processing does not mean we cannot access AI tools on our mobile devices, of course. You can easily open a Web browser on your iPad and use it to produce all the AI-generated art, email, and stories you could ever want, but the resource-heavy computation is handled in the cloud rather than locally. This cloud-based setup means any device, tool, or application looking to take advantage of deep learning must be tied to the Internet. Unfortunately, it is not always feasible to have a reliable connection. Depending on where you are in the world, you may have limited bandwidth. Even if high-bandwidth 5G connectivity is an option, you may still face the kind of reliability issues familiar to anyone who has tried to use AI-enabled speech recognition to switch songs on a cross-country road trip.

There are, of course, plenty of applications where a permanent connection to the cloud is not too much of an ask. For example, the AI used for automation at a factory is unlikely to be installed somewhere without a decent Internet connection. However, there are a huge number of use cases where we could benefit from local AI (such as performing the computation locally, where data is collected, rather than in the cloud). For those working with sensitive data—for instance, health or financial data—sending the data to the cloud may also pose security or privacy concerns. For those working on applications that require a fast reaction time—for instance, autonomous navigation or augmented/virtual reality—the time it takes to send the data to cloud and wait for a response may impose too large of a latency. To achieve local AI, we need to change how we think about designing both our processing hardware and our deep learning software.

Back to Top

The Limits of Thought

Before we make the mistake of assuming the solution is to increase the amount of processing hardware to throw at the problem, it is important to remember that a lack of raw processing speed is not the only factor that makes portable deep learning challenging. Even if we were somehow able to miniaturize one of the powerful racks used in cloud computing set-ups, we would still have to contend with two associated issues: heat, and energy consumption.

These are both perennial challenges for any computing set-up, but they are amplified for smaller, portable devices. A self-driving car can use more than 1,000 watts to compute and analyze the data from its cameras and other sensors, for example. In contrast, a typical smartphone processor must run at approximately one watt. The more power the processor consumes, the more heat it produces. Processors in the cloud can generate so much heat that they require liquid cooling. This would not be feasible for a small device trying to handle deep learning applications in your pocket or your backpack; not only is liquid cooling infeasible, but often you can’t even use a fan, due to the size and weight constraints of these portable devices.


For those working on applications that require a fast reaction time—for instance, autonomous navigation or augmented/virtual reality—the time it takes to send the data to the cloud and wait for a response may impose too large of a latency.


Even if we could get sufficient power into the processor, we still must answer the question of energy storage. What’s the point in having an AI assistant embedded in your smartphone if you can only run it for 10 minutes before having to find an outlet to recharge your battery?

Back to Top

Untethering AI

If developers and designers want to embrace all the benefits of untethered AI—whether that means a semi-intelligent delivery drone or a camera capable of real-time image processing—we need to eliminate these challenges. Or, at least, find a way to mitigate their impact.

The answer comes in the form of improved efficiency at every possible level of the equation, from the code we use to specify the deep learning operations through to the hardware we run it on. It’s possible to write algorithms that take our existing deep learning concepts but run them more efficiently, both in terms of computation and power draw. For example, we are able to trim areas of code that were valuable during the AI learning and training process but serve little use in a finished product.

On the hardware side of things, there are plenty of ways to make AI and deep learning technology run more efficiently. Most of the processor hardware we currently use in our computers to run our software are generalist—they are pretty good at handling any application you throw at them but do not excel at any individual task. Yet it is possible to design hardware specially tailored to the demands of deep learning. Google released a chip specialized for the matrix operations in deep learning more than half a decade ago (https://bit.ly/3Hw5Nca), while my own team at MIT has developed one built specifically for deep learning tasks (https://bit.ly/3NAm1F0) that can operate on less than one-third of a watt and is 10 times more energy-efficient than a more general purpose mobile processor.

We find the greatest opportunities for improvement when we begin to think about hardware and software at the same time. When these two aspects work in tandem, we can apply techniques to eliminate unnecessary computations and reduce the distance that information needs to travel across the physical chip during computations; both approaches can speed up the chip and reduce its power and energy consumption.

If carefully applied, these advancements can untether the power of deep learning, and allow businesses to truly innovate with intelligent devices.

Back to Top

A Wide-Ranging Benefit

The focus of my work is on being able to overcome the hurdles that keep AI technology tethered to the cloud, but the benefits of making deep learning more efficient do not stop there.

Datacenters, for example, are already infamous for their heavy power draw. As AI usage becomes more and more mainstream, it is not hard to predict that a significant chunk of this power will be devoted to running deep learning algorithms. If we can develop the technology to run these systems more efficiently, the energy savings have the potential to be massive, including reducing its carbon footprint and helping us move towards a more sustainable future.

These energy savings upstream can also mean cost savings downstream, making it easier for more users—with even more modest budgets—to access the incredible power of deep learning. With the age of AI descending upon us, this can only be a good thing.

 

Join the Discussion (0)

Become a Member or Sign In to Post a Comment

The Latest from CACM

Shape the Future of Computing

ACM encourages its members to take a direct hand in shaping the future of the association. There are more ways than ever to get involved.

Get Involved

Communications of the ACM (CACM) is now a fully Open Access publication.

By opening CACM to the world, we hope to increase engagement among the broader computer science community and encourage non-members to discover the rich resources ACM has to offer.

Learn More