Maintaining Plasticity in Deep Continual Learning

Published 23 Jun 2023 in cs.LG | (2306.13812v3)

Abstract: Modern deep-learning systems are specialized to problem settings in which training occurs once and then never again, as opposed to continual-learning settings in which training occurs continually. If deep-learning systems are applied in a continual learning setting, then it is well known that they may fail to remember earlier examples. More fundamental, but less well known, is that they may also lose their ability to learn on new examples, a phenomenon called loss of plasticity. We provide direct demonstrations of loss of plasticity using the MNIST and ImageNet datasets repurposed for continual learning as sequences of tasks. In ImageNet, binary classification performance dropped from 89% accuracy on an early task down to 77%, about the level of a linear network, on the 2000th task. Loss of plasticity occurred with a wide range of deep network architectures, optimizers, activation functions, batch normalization, dropout, but was substantially eased by L2-regularization, particularly when combined with weight perturbation. Further, we introduce a new algorithm -- continual backpropagation -- which slightly modifies conventional backpropagation to reinitialize a small fraction of less-used units after each example and appears to maintain plasticity indefinitely.

Abstract PDF Upgrade to Chat

Citations (21)

View on Semantic Scholar

Summary

The paper identifies a significant loss of plasticity, showing deep networks drop from 89% to 77% accuracy over 2000 tasks despite common techniques.
It introduces continual backpropagation with intermittent reinitialization to counteract performance degradation in evolving data environments.
The study challenges standard practices like dropout and Adam, advocating for revised strategies to preserve learning adaptability in continual learning systems.

Loss of Plasticity in Deep Continual Learning

The paper under discussion provides a rigorous examination into the phenomenon of "loss of plasticity" in deep learning systems when exposed to continual learning settings, distinguishing this issue from the more commonly discussed problem of catastrophic forgetting. The authors meticulously define loss of plasticity as a reduction in a deep learning system's ability to learn from new data over time, a detrimental characteristic when these systems are deployed in an environment where data distributions evolve continually.

Key Findings

Utilizing variations of the MNIST and ImageNet datasets reconfigured to suit a continual learning paradigm, the authors present compelling evidence that multiple widely used deep learning architectures suffer notable losses in plasticity. When subjected to a sequence of 2000 binary classification tasks, a deep architecture initially demonstrating 89\% accuracy on early tasks regresses to a performance level comparable to linear networks, registering only 77\% on later tasks. This decline occurred despite employing diverse architectures, optimizers, activation functions, batch normalization, and dropout. However, the introduction of $L^2$ -regularization paired with weight perturbation significantly ameliorated this loss of plasticity.

The authors introduce a novel approach, continual backpropagation, which modifies the conventional backpropagation algorithm. This technique includes the intermittent reinitialization of under-utilized units, allowing the system to maintain plasticity throughout the learning process. Their experimental results suggest that this new methodology successfully mitigates the persistent degradation of plasticity in deep networks.

Implications

The implications of these findings are profound, particularly as deep learning systems migrate from static datasets toward environments requiring adaption and learning from continuous streams of data. Traditionally, deep learning strategies have centered on stability and memory retention (preventing catastrophic forgetting), but this paper stresses the importance of preserving plasticity as a concurrent goal.

By confirming that widely adopted methods like dropout and Adam optimization exacerbate plasticity loss, the study challenges the adequacy of current practices in scenarios beyond traditional training settings. Continual backpropagation, while still requiring refinement and testing in more diverse contexts, may provide a foundational shift on how neural networks are designed for continual learning environments.

Future Directions

Further exploration into the scalability of these approaches in large scale, real-world domains such as autonomous driving, robotic control, and adaptive systems in natural language processing are warranted. Additionally, distinct research avenues might include revisiting the interplay between plasticity retention mechanisms and memory retention strategies, potentially uncovering synergies that could alleviate both catastrophic forgetting and loss of plasticity.

The growing importance of these systems necessitates the development of more robust theoretical foundations to distinguish between different sources of non-stationarity and their impact on network performance. Furthermore, a theoretical formalization of loss of plasticity and its metrics could augment the effectiveness of existing models and accelerate the development of novel architectures engineered to autonomously adapt to changing conditions.

In summary, this paper contributes a foundational understanding that challenges existing paradigms, urging the community to re-evaluate the design principles of deep learning systems for continual learning and pushes the boundary of what is currently understood about neural plasticity in artificial systems.