Lifelong Learning at the Edge

A chart from the study. — The diverse SKILL dataset included 102 different tasks (colors) shown here by task difficulty (y-axis), the number of classes per task type (x-axis), and the number of total images per task type (circle size).

Connectionist deep neural networks (DNN) spend long periods learning vast databases, then distill that knowledge into fixed inference-only engines suitable for deployment at the network's edge. If new tasks need to be added to the inference engine, they traditionally have to be added to the initial central database and the learning process repeated, then redeployed. Why? Because leaving learning "on"—so-called lifelong learning (LL)—in an already trained deep neural network typically overwrites portions of the existing inference engine ,which has the unwanted side-effect of "forgetting" examples in the original database, according to Robert French of the University of Liége (Belgium) in his seminal study Catastrophic forgetting in connectionist networks.

Now researchers at the University of Southern California (USC), Intel Labs, and China's Shenzhen Institute of Advanced Technology claim to have achieved an open-source architecture called Shared Knowledge Lifelong Learning (SKILL) that allows Lightweight Lifelong Learning (LLL) agents to acquire new knowledge at the edge of the network, then share it among any number of other edge agents without the risk of catastrophic forgetting.

"Learning at the edge is very important. Today, some estimates put 25-to-30% of Internet data coming from edge sensors, making learning at the edge a necessity," said Vijaykrishnan Narayanan, associate dean for Innovation and Robert Noll Chair Professor at Pennsylvania State University (Penn State), who was not involved in the research. "These researchers have done foundational work that can open the doors to learning at the edge by sharing knowledge in ways that have not been tried before. Two contributions stand out. Firstly, they have distributed lifelong learning that can learn incrementally on multiple tasks, without having to start over again to avoid forgetting. The second, perhaps even a longer-lasting contribution, is that the SKILL dataset involves 102 different types of learning by the same distributed agents—such as learning how shadows move, while learning different insect types. The dataset is much more challenging than just object classification as is noted by the open source reviews."

To be sure, many other researchers have tackled the problem of lifelong learning without catastrophic forgetting for centralized systems, including attempts to emulate how the human brain does it using structural plasticity, memory replay, curriculum and transfer learning, intrinsic motivation, and multi-sensory integration as explained in Continual Lifelong Learning with Neural Networks: A Review, by German Parisi at the University of Hamburg (Germany) with colleagues at the Rochester Institute of Technology (New York),and Heriot-Watt University (Edinburgh, U.K.). However, none of those brain-like approaches attempted to solve today's problem of consolidating fully distributed learning among small (often battery-powered) agents at the edge of the network.

Other groups are exploring that same learning-at-the-edge goal, but all the competing methods use a centralized smart database to consolidate and distribute the autonomous independent learning taking place among multiple agents, according to the USC, Intel, and Shenzhen Institute researchers. As a result, they incur the burden of high-bandwidth communications capabilities to, and dependence on, a centralized architecture, and as a consequence cannot achieve anything near linear speedup of SKILL as new individual agents are added.

The key to the unique success of SKILL, according to these researchers, is an architecture that combines a fixed common core recognition engine built into every agent at its manufacturing, plus task-specific sharable modules that each agent creates when it learns a new task, then shares with the agents to which it is connected by the network. To the fixed core of each agent, the newly acquired tasks are appended, as it were, by adding new classification neurons.

Said USC professor of computer science Laurent Itti, "For the first time, this research shows how a fully distributed society of individual agents can benefit from each other's learning yet retain their individuality with near-perfect parallelization while learning to master a diverse set of skills."

Neural networks work by using their middle layers to extract features (color, shape, etc.) from the objects presented to the input layer. The final layer consists of the classification category neurons, each of which is activated by a unique set of extracted feature neurons to which it is attached by weighted synapse connections. SKILL works by leaving all the input-, middle-, final-layer neurons and tuned synapses in place, thus avoiding forgetting. Instead, it learns new categories by adding new classification neurons for each new task. The new neurons are attached, with weighted synapses, to the existing feature-layer neurons of the permanent central core.

Thus, any agent (with the identical central core) can broadcast any newly appended classification categories to the other individual edge agents by attaching the new classification neurons to their common feature recognition neurons in exactly the same manner as the original edge learner agent. The entire network of agents acquires lifelong learning capabilities without the risk of forgetting the categories recognized by the original core recognition engine, or for that matter, any of the newly appended categories shared among agents.

To prove-the-concept of the SKILL architecture, a Google open-source deep neural network was trained on the standard ImageNet dataset of about 1,000 object classes that were extracted from ImageNet's 1,281,167 training images (plus 50,000 validation images and 100,000 test images). This DNN was then burned into a ROM as the fixed core for each edge agent.

The edge agents then learned over 100 new tasks from a composite dataset of 2,041,225 examples, placing them into 5,033 categories connected to the original 22,000 neurons and 2.9 million synapses in the common core DNN. The middle feature extraction layer output was a 2,048-feature vector to which the new knowledge tasks were appended via synapses connected to new output neurons added for each new task. SKILL successfully learned the more than 100 new pattern recognition tasks and successfully transferred that knowledge to multiple agents. Itti claimed the current architecture could scale up to learning 500 new classification tasks.

The biggest hinderance to scaling beyond 500 new tasks with the current architecture was the need for a task mapper, to differentiate new knowledge from original core knowledge. The researchers hope to remedy this limitation in future architectures. Also, recognition among highly differentiated image categories (say, between flowers and heart x-rays) currently requires an algorithm to fine-tune the bias inputs to neurons when switching to a new task category.

Itti claims the SKILL multiple independent agent learning prototype outperforms its current competition and achieves near-linear speedup in learning per added agent, and holds promise for future upgrades. The work was supported by the U.S. Defense Advanced Research Projects Agency (DARPA), Semiconductor Research Corp., and the U.S. Army Research Office.

R. Colin Johnson is a Kyoto Prize Fellow who has worked as a technology journalist for two decades.