Like humans, machines need to continually learn from non-stationary information streams. While this is a natural skill for humans, it’s challenging for neural networks-based AI machines.
One inherent problem in artificial neural networks is the phenomenon of catastrophic forgetting. Deep learning researchers are working extensively to solve this problem in their pursuit of AI agents that can continually learn like humans.
Research in continual learning, and AI in general, has drawn inspiration from human intelligence and computational neuroscience. Scientific researchers aim to bridge the gap between cognitive sciences and modern deep learning, and a key objective toward achieving Artificial General Intelligence (AGI) is to implement the brain skill of continual learning in artificial neural networks.
So, let’s take a look.
Continual Learning refers to the ability to learn from non-stationary information streams incrementally.
“Non-stationary” represents continuously changing data distributions.
“Incremental” learning refers to preserving previous knowledge while continuously learning new information.
For example, an AI image classifier for self-driving vehicles is trained on a data distribution of cars. The model is continuously exposed to different images of different vehicle form factors, models and types. While the model can learn to classify vehicles of different sizes with high accuracy, it must also correctly classify other objects visible in an open road environment, including pedestrians, trees, road signs, traffic lights and road blocks.
At the time of inference — where the AI model needs to make an intelligent decision to classify objects in its peripheral view — the model should retain all of its previously learned knowledge.
In order to achieve this goal, continual learning requires the following key characteristics:
Continual learning AI systems can adapt to learn new data distributions without requiring significant (re)training on new datasets.
In a real-world setting, information about the surroundings can change rapidly. Artificial neural networks suffer from loss of plasticity — they are no longer able to change predictions based on new data. (This is similar to neural plasticity in the human brain, which refers to the capacity of the nervous system to modify its structure and functionality.)
Continual learning systems are highly expected to achieve high adaptation with minimal loss of plasticity, that is, their ability to learn from new information.
Continual learning can take advantage of task and context similarity between learning tasks that are related – training a neural network model on one task such that it also performs well on another related task is called positive transfer.
Humans behave similarly: an athlete who has excelled at one format of a sport can also perform well and compete in other related sports.
Another desirable property of continual learning models is to be able to perform well without knowledge of the task identity or task switching underlying a training process.
For example, a model training to classify cars should be able to recognize that an airplane belongs to a different data distribution, despite similarities such as wheels and windows.
AI models train on large datasets. These datasets contain noise — unwanted signal errors in an image, sound or video stream and are not a part of the data sample itself. This is common for sensors that pick up information from the source due to fluctuations in the surrounding environment or the device itself.
Continual learning models should be able to learn the true data distribution without the noise components added to it.
While a sufficiently large AI model trained on large data assets can learn to generalize well on multiple data distributions, it is not necessarily the most sustainable, cost-effective and resource efficient method. Continual learning models should be compact and resource efficient in terms of:
Storage
Computing
Energy requirements
So how do you train a continual learning model?
One of the most common types of approaches is referred to as replay-based continual learning approach. In this approach, the model is periodically exposed to data from previous distributions to avoid catastrophic forgetting.
Another popular approach involves parameter regularization. This method involves imposing constraints on the model parameters to encourage the model to learn simple and more generalizable representations of the data distribution.
Recent advances involve adding context to the model architecture itself: different parts of the neural network model are tuned to perform well on different tasks and data distributions. It may be the case that the end-to-end network model comprises a set of smaller expert models each specializing in unique and distinct tasks.
An obvious assumption here is that we have sufficient knowledge of the task itself. In real-world scenarios, that is not always the case.
For example, a self-driving car is likely to experience objects that it has never observed before and therefore, classifying them among a known distribution may only contribute to its catastrophic forgetting.
This posting does not necessarily represent Splunk's position, strategies or opinion.
The Splunk platform removes the barriers between data and action, empowering observability, IT and security teams to ensure their organizations are secure, resilient and innovative.
Founded in 2003, Splunk is a global company — with over 7,500 employees, Splunkers have received over 1,020 patents to date and availability in 21 regions around the world — and offers an open, extensible data platform that supports shared data across any environment so that all teams in an organization can get end-to-end visibility, with context, for every interaction and business process. Build a strong data foundation with Splunk.