Artificial Intelligence and Machine Learning

Forget the Catastrophic Forgetting

Researchers are working to counteract catastrophic forgetting by AI systems when they are trained for new tasks.


Although artificial intelligence (AI) is constantly getting more sophisticated, aspects of the technology still need to be addressed. Neural network-based systems, for example, are prone to a phenomenon called catastrophic forgetting, whereby they forget tasks they have previously learned when they are trained on new ones.

“(It) is a significant problem for all machine learning models and systems,” says Kartik Talamadupula, director of AI research at, a company that creates purpose-built AI models for communication data, and applied AI officer of the ACM Special Interest Group on Artificial Intelligence (ACM SIGAI).

Catastrophic forgetting could be problematic when an AI system needs to continually adapt to new environments. With automated driving, for example, the self-driving vehicle may become less able to navigate a dense urban neighborhood if this is what it has learned first and it is trained to drive on a freeway later on, says Ness Shroff, a chaired professor in the Department of Electrical and Computer Engineering at The Ohio State University in Columbus, Ohio. Alternatively, if the self-driving vehicle has learned certain routes that are more efficient to use at rush hour, it may not remember them after learning routes that are quicker at other times of day.

The phenomenon is also of interest since it illustrates a key difference between the way humans and AI learn different tasks. Continual learning is a training paradigm that involves presenting an AI model with a continuous stream of new information, resembling the life-long learning capabilities of humans, yet catastrophic forgetting occurs. “Humans are actually unlikely to forget previous skills (they) have learned when they learn new ones,” says Yingbin Liang, a professor at The Ohio State University and Shroff’s colleague.

That is why researchers are trying to better understand what causes catastrophic forgetting, and to come up with new solutions. Previous work has yielded some insight into why previous knowledge is forgotten. A model has a limited capacity in terms of how much information it can retain, for example. Increasing that capacity can help avoid forgetting, but it also means training it will become more complex. Another strategy is to include previously-seen examples when a model is trained on a new task, but this requires even more memory. “All of these approaches have their drawbacks,” says Liang.

In recent work, Liang, Shroff and their colleagues examined catastrophic forgetting in continual learning from a theoretical perspective to try and understand the mechanisms responsible. Their study analyzed a simple linear machine learning model and quantified how much knowledge from old tasks was forgotten when it learned new tasks. The study also looked at the model’s ability to generalize, meaning how it performed when presented with previously unseen tasks that differed from those on which it was trained. The researchers were interested in homing in on how past knowledge could help with new tasks, and similarly how skills learned later on might reinforce those acquired previously. “In that case, you’re not forgetting the previous skills, but (instead you) have a strengthening of skills,” adds Liang.

The team found their model was able to generalize better when trained on new tasks that were similar to those it had been trained on previously. However, the opposite was seen with forgetting: the model remembered previous knowledge less well when tasks were similar, in certain cases.

Liang and her colleagues were surprised: intuitively, it seems like the model would remember better, since the new task might reinforce the previous one. “We have some explanation for that,” says Liang. “When (the model) sees a slightly new task, it can dramatically shift towards the new task and forget more about the old ones.”

The Ohio State researchers found that the order in which tasks were learned had an impact on how much their model forgot. They demonstrated that training a model with tasks that were different from each other first, and presenting it with similar tasks later on, helped reduce forgetting. “We have done experiments to verify that even in practical models, the same effect holds,” says Shroff, “so this could be a rule of thumb in the application of continual learning: how to order tasks in order to make use of the capacity that you have in the best way.”

Another team has pioneered a training approach that can help mitigate catastrophic forgetting in continual learning by mimicking the way the human brain reinforces new information. The method developed by Concetto Spampinato, an associate professor of information processing systems at the University of Catania in Sicily, Italy, and his team is called wake-sleep consolidated learning (WSCL) and incorporates a wake phase and sleep phase that also simulate dreaming. “In our lab, we try to (apply) cognitive theories to computational models,” says Spampinato.

During the wake phase of WSCL, a deep neural network is exposed to new data, which are stored in short-term memory. The network then is subjected to a sleep phase composed of two parts. In the first, information collected in the previous phase is replayed and stored in long-term memory, while in the dreaming phase, the network is exposed to dream-like samples to prepare it for new experiences. “It is a problem that humans also have, we don’t know what to expect in the future,” says Spampinato. “And one strategy that biology enforces in our learning strategy is dreaming.”

In their recent pre-print, Spampinato and his colleagues examined how a deep neural network performed on visual classification tasks, such as distinguishing between images of different types of animals or objects, when WSCL was combined with continual learning. For example, they measured the network’s capacity to forget, its average accuracy, and whether prior knowledge facilitated the learning of new tasks. The team found that incorporating WSCL did reduce forgetting, as well as slightly increasing a model’s accuracy compared to when continual learning was used on its own. Its ability to transfer knowledge to new tasks also improved with WSCL.

“This means that the dreaming phase basically prepares the network in terms of features that can be reused in the future,” says Spampinato.

The University of Catania team plans to follow up on its work by improving how it simulates dreams. Instead of using imagery from external datasets as the team did in the study, they would like to generate dreams using a model’s existing knowledge in order to create scenarios that are more relevant to tasks likely to be encountered in the future.

Spampinato thinks that taking inspiration from the human brain is the way forward to improving AI systems, even though we still don’t understand many of its neural processes. “Our capability not to forget and to generalize is much better than any approach at the moment,” he says. “So that’s the benchmark we are aiming for.”

Sandrine Ceurstemont is a freelance science writer based in London, U.K.

Join the Discussion (0)

Become a Member or Sign In to Post a Comment

The Latest from CACM

Shape the Future of Computing

ACM encourages its members to take a direct hand in shaping the future of the association. There are more ways than ever to get involved.

Get Involved

Communications of the ACM (CACM) is now a fully Open Access publication.

By opening CACM to the world, we hope to increase engagement among the broader computer science community and encourage non-members to discover the rich resources ACM has to offer.

Learn More