Computing Profession

Why Businesses Must Untether Deep Learning

MIT Associate Professor Vivienne Sze

While it's clear that deep learning (a core technology used in AI-enabled applications) can deliver tangible results, the technology's applications are still being constrained in several different directions. Many of the limitations in terms of accuracy and ability can be addressed in the coming years as programmers and designers refine their algorithms and pile on more training data. 

There is one constraint, though, that seems fundamental to the nature of deep learning, to the extent that many developers almost accept it as the price of doing business: the sheer computation power required to run it.

Tied to the Cloud

Deep learning involves running hundreds of millions or even billions of compute operations (e.g., multiplies and additions). The GPT3 model—the foundation for the wildly successful ChatGPT tool—is reported to have used 175 billion parameters, requiring over 1023 compute operations to train (which translates to millions of dollars) and even the finished product requires clusters of powerful, and expensive, processers to run effectively.

While this hunger for processing power isn't going to surprise anybody familiar with the technology, the limitations it imposes extend far beyond the need to buy more processors. It also makes it extremely difficult to run deep learning on a portable device—the kind of thing that people are likely to have in their home, bag, or pocket.

This need for processing doesn't mean that we can't access AI tools on our mobile devices, of course. You can easily open a Web browser on your iPad and use it to produce all the AI-generated art, emails, and stories you could ever want, but the resource-heavy computation is handled on the cloud rather than locally. This cloud-based setup means that any device, tool, or application looking to take advantage of deep learning must be tied to the Internet. Unfortunately, it is not always feasible to have a reliable connection. Depending on where you are in the world, you may have limited bandwidth. Even if high-bandwidth 5G connectivity is an option, you may still face the kind of reliability issues familiar to anyone that has tried to use AI-enabled speech recognition to switch songs on a cross-country road trip. 

There are, of course, plenty of applications where a permanent connection to the cloud isn't too much of an ask. For example, the AI used for automation at a factory is unlikely to be installed somewhere without a decent Internet connection. However, there are a huge number of use cases where we could benefit from local AI (i.e., performing the computation locally, where data is collected, rather than in the cloud). For those working with sensitive data—for instance, health or financial data—sending the data to the cloud may also pose security or privacy concerns. For those working on applications that require a fast reaction time—for instance, autonomous navigation or augmented/virtual reality—the time it takes to send the data to cloud and wait for a response may impose too large of a latency. To achieve local AI, we need to change how we think about designing both our processing hardware and our deep learning software.

The Limits of Thought

Before we make the mistake of assuming the solution is to increase the amount of processing hardware to throw at the problem, it's important to remember that a lack of raw processing speed isn't the only factor that makes portable deep learning challenging. Even if we were somehow able to miniaturize one of the powerful racks used in cloud computing set-ups, we would still have to contend with two associated issues: heat, and energy consumption.

These are both perennial challenges for any computing set-up, but they are amplified for smaller, portable devices. A self-driving car can use over a thousand watts to compute and analyze the data from its cameras and other sensors, for example. In contrast, a typical smartphone processor needs to run at around one watt. The more power  the processor consumes, the more heat it produces. Processors in the cloud can generate so much heat that they require liquid cooling. This would not be feasible for a small device trying to handle deep learning applications in your pocket or your backpack; not only is liquid cooling infeasible, but often you can't even use a fan due to the size and weight constraints of these portable devices.

Even if we could get sufficient power into the processor, we still need to answer the question of energy storage. What's the point in having an AI assistant embedded in your smartphone if you can only run it for 10 minutes before having to find an outlet to recharge your battery?

Untethering AI

If developers and designers want to embrace all the benefits of untethered AI—whether that means a semi-intelligent delivery drone or a camera capable of real-time image processing—we need to eliminate these challenges. Or, at least, find a way to mitigate their impact.

The answer comes in the form of improved efficiency at every possible level of the equation, from the code we use to specify the deep learning operations through to the hardware we run it on. It's possible to write algorithms that take our existing deep learning concepts but run them more efficiently, both in terms of computation and power draw. For example, we're able to trim areas of code that were valuable during the AI learning and training process, but serve little use in a finished product. 

On the hardware side of things, there are plenty of ways to make AI and deep learning technology run more efficiently. Most of the processor hardware we currently use in our computers to run our software are generalist—they're pretty good at handling any application you throw at them, but don't excel at any individual task. Yet it's possible to design hardware specially tailored to the demands of deep learning. Google released a chip specialized for the matrix operations in deep learning more than half a decade ago, while my own team at MIT has developed one built specifically for deep learning tasks that can operate at less than a third of a watt and is 10 times more energy efficient than a more general purpose mobile processor.

We find the greatest opportunities for improvement when we begin to think about hardware and software at the same time. When these two aspects work in tandem, we can apply techniques to eliminate unnecessary computations and reduce the distance that information needs to travel across the physical chip during computations; both approaches can speed up the chip and reduce its power and energy consumption.

If carefully applied, these advancements can untether the power of deep learning, and allow businesses to truly innovate with intelligent devices.

A Wide-Ranging Benefit

The focus of my work is on being able to overcome the hurdles that keep AI technology tethered to the cloud, but the benefits of making deep learning more efficient don't stop there.

Datacenters, for example, are already infamous for their heavy power draw. As AI usage becomes more and more mainstream, it's not hard to predict that a significant chunk of this power will be devoted to running deep learning algorithms. If we can develop the technology to run these systems more efficiently, the energy savings have the potential to be massive, including reducing its carbon footprint and helping us move towards a more sustainable future. 

These energy savings upstream can also mean cost savings downstream, making it easier for more users—with even more modest budgets—to access the incredible power of deep learning. With the age of AI descending upon us, this can only be a good thing.


Vivienne Sze is Associate Professor of Electrical Engineering and Computer Science at the Massachusetts Institute of Technology (MIT), co-author of the book Efficient Processing of Deep Neural Networks, and lead instructor of the MIT Professional Education course Designing Efficient Deep Learning Systems.

Join the Discussion (0)

Become a Member or Sign In to Post a Comment

The Latest from CACM

Shape the Future of Computing

ACM encourages its members to take a direct hand in shaping the future of the association. There are more ways than ever to get involved.

Get Involved

Communications of the ACM (CACM) is now a fully Open Access publication.

By opening CACM to the world, we hope to increase engagement among the broader computer science community and encourage non-members to discover the rich resources ACM has to offer.

Learn More