Edge AI Devices Eye Lifetime Learning

Today, artificial intelligence (AI) devices at the network’s edge—such as the billions of Internet of Things (IoT) devices IDC predicts will net $1 trillion by 2026—should have lifelong learning built-in, according to a collaborative project by experts at the Air Force Research Laboratory, Argonne National Laboratory, Rochester Institute of Technology, Sandia National Laboratories, and the University of Texas, San Antonio (among others).

Lifelong learning (sometimes called continuous learning) means that an AI-based device (containing knowledge in a pretrained neural network) can acquire new knowledge autonomously in real time as a standalone device in the field.

Very few software or hardware designers would dispute the fact that edge devices require low-power chips (accelerators), especially for battery-powered operation. However, the collaborators on this project also add a more stringent condition—namely, that to outperform today’s cloud-based devices, the next generation of edge devices will also require accelerators with the built-in ability to assimilate new knowledge when confronted with novelty in the field over their entire lifetime.

“The ability to evolve strategies based on new experiences is critical to the success of human endeavors. Consequently, AI-devices that assist humans should embody autonomous and continuous learning abilities too,” said novel accelerator expert Vijaykrishnan Narayanan, associate dean for Innovation and Robert Noll Chair Professor at Pennsylvania State University (who was not involved in this project). “The efficiency and effectiveness of such continuous learning, especially in resource-constrained edge devices, requires a synergy between the underlying hardware and algorithmic techniques as articulated in this project.”

The project members confirmed the feasibility of lifetime learning in network edge devices by enumerating and characterizing the necessary design principles, then illustrating them by reviewing 22 prototypes now under development worldwide—from academic labs to startups like Brainchip to industrial leaders like Intel.

None of these prototypes include all the necessary features, according the lead researcher of the Design Principles for Lifelong Learning AI Accelerators project, Dhireesha Kudithipudi at the University of Texas (San Antonio), director of its Matrix AI Consortium and its Neuromorphic Artificial Intelligence Laboratory. Nevertheless, lifelong learning is still an ongoing goal for AI, especially edge AI devices. Unfortunately, today even critical edge-devices such as electric vehicles (EVs) use the most primitive start-from-scratch method to confront novelty. For instance, if during its lifetime an EV is involved in an auto accident, the AI’s neural network is retrained from scratch off-line with the error-causing conditions added to the original training data set, followed by a version update release.

This project goal’s, however, is laying the foundations for AI devices that can acquire new knowledge in real time when confronted with novelty.

“Designing lifelong learning in edge devices is very much a Grand Challenge in the field of AI. No single AI model currently exists that incorporates all the features of lifelong learning as observed in biological brains. Future AI accelerators will benefit by incorporating ‘biologically inspired’ lifelong learning features at various levels of abstraction,” said Kudithipudi. “Since we absolutely have to design such machines and deploy them on the edge, then we should start designing the appropriate optimizations in the algorithms, architectures and technologies now.”

Heterogeneous and adaptive memory architectures (here gold neurons/synapses added for lifetime learning in-the-field) that support data access with variable latency, high bandwidth, flexible data storage, and minimal multi-tenancy (time lapse between two sequential tasks) will play an important role in lifetime learning accelerators. (Credit: University of Texas)

The first requirement for lifelong learning in low-power standalone devices, according to Kudithipudi, is the need for new algorithms that can be updated in the field when novelty is encountered. Today’s algorithms learn on supercomputers at 64- or 32-bits, then deploy an encapsulated inference engine at as little as 8- or even 4-bit resolution. Today’s deployed neural network inference engines may work fine for their original trained data set, but do not have the resolution to learn new examples.

The second requirement for lifelong learning architectures, according to Kudithipudi, is architectural reconfigurability when learning new examples on the fly in real time, without forgetting lessons already learned.

The third, and most ambitious engineering effort needed today, says Kudithipudi, is to perfect new memory technologies with which to build those new reconfigurable architectures and which will execute those new lifetime learning algorithms—such as ultra-low-power three-terminal analog memtransistors, and two-terminal analog memristors, as well as various digital non-volatile memories such as phase-change memory and spin transfer torque (STT) memory.

Progress To Date

Researchers have been studying lifelong learning for years, under the guise of mimicking the continuous real time operation of the brain, according to Angel Yanguas-Gil, principal materials scientist at Argonne National Labs who collaborated on the project, as well as on the previous Defense Advanced Research Project Agency (DARPA) Grand Challenge on Lifelong Learning Machines.

“Now, finally, the time has come to stop focusing exclusively on designing new algorithms, architectures, and technologies, and start thinking about how to deploy them in the field. Now is the time to incorporate milliwatt hardware technologies into continual learning architectures that execute lifelong learning algorithms,” said Yanguas-Gil.

Many algorithms, architectural structures, and low-power persistent memory technologies, according to Yanguas-Gil, have already been confirmed as candidates to be co-designed together when implementing lifelong learning devices. For instance, to avoid catastrophic forgetting (that is, the loss of existing knowledge when adding new knowledge in the field) important examples from the original training dataset could be held in buffer memories. Then on-chip relearning from a novel new example could be prevented from catastrophically forgetting important knowledge in the original dataset.

Alternatively, adding new neurons, synapses, or even entire new layers to a reconfigurable neural architecture could likewise permit the acquisition of new knowledge while mitigating catastrophic forgetting by leaving the already learned knowledge in place.

Another method cited by the project collaborators is integrating “extra” network resources—neurons and synapses—into the original neural network, but with randomized values reserved for learning new knowledge without affecting existing knowledge.

“We need to prioritize co-design of algorithms, architectures, and technologies that implement lifelong learning on edge-devices now, just as they are exploding onto commercial and industrial markets,” said Kudithipudi. “Our thinking should not be limited by the edge devices or the models that exist in the current market. We know that biological systems perform lifelong learning using tiny amounts of power—after all, the trillions of synapses in the human brain operate on only 20 watts. We know it is possible and now is the time to co-design solutions based on examples from biology—these solutions will emerge from the domain of neuromorphic computing.”

The example of ultra-low-power lifetime learning in the human brain does not mean engineers should necessarily copy the exact manner used by the brain (even the artificial neural networks of today are based on only a very small part of what real brains do—and scientists don’t understand the rest, much less have foolproof algorithms for engineers to start using), according to collaborator James “Brad” Aimone, a Distinguished Member of the Technical Staff at Sandia National Laboratories and leader of the U.S. Department of Energy’s co-design project titled COINFLIPS (CO-designed Influenced Neural Foundations Inspired by Physical Stochasticity).

“Our understanding of how exactly the brain learns continuously is very much preliminary,” said Aimone. “Mimicking the brain today is more about its efficiency than computability, since anything can be programmed traditionally, but not with the efficiency of the brain. We don’t currently understand the brain’s precise lifetime learning algorithms and thus can’t mimic with AI. That’s why we put together this project. Taking inspiration from the brain is inevitable—even though we don’t fully understand how the brain does it—nevertheless we still should be co-designing what we do know today about lifelong learning into our AI devices. Lifelong learning should no longer be an afterthought for any AI.”

Outlook

In a nutshell, the project ended up proposing co-designing of lifelong learning algorithms, architectures, and multi-level 3D memory technologies into tomorrow’s accelerators, using heterogeneous execution units that together consume less than 1 milliwatt of power while generating a billion operations per second.

“Recent progress in heterogeneous integration through chiplets and monolithic 3D integration of non-volatile memory technologies provide hope that continuous learning devices will soon become ubiquitous,” said Narayanan.

According to the project collaborators’ conclusions, new algorithms need to be co-designed into devices which perform both real-time inference and real-time learning. Architectures need to be co-designed as reconfigurable, instantly switching between coarse and fine granularities as needed. And finally, memory technologies need to be co-designed that optimize both inference and learning over multiple spatial and temporal time scales, and with various ranges of latency, tenancy, energy consumption, and endurance.

R. Colin Johnson is a Kyoto Prize Fellow who has worked as a technology journalist for two decades.