Forget Forgetting: Continual Learning in a World of Abundant Memory

Published 11 Feb 2025 in cs.LG and cs.AI | (2502.07274v4)

Abstract: Continual learning (CL) has traditionally focused on minimizing exemplar memory, a constraint often misaligned with modern systems where GPU time, not storage, is the primary bottleneck. This paper challenges this paradigm by investigating a more realistic regime: one where memory is abundant enough to mitigate forgetting, but full retraining from scratch remains prohibitively expensive. In this practical "middle ground", we find that the core challenge shifts from stability to plasticity, as models become biased toward prior tasks and struggle to learn new ones. Conversely, improved stability allows simple replay baselines to outperform the state-of-the-art methods at a fraction of the GPU cost. To address this newly surfaced trade-off, we propose Weight Space Consolidation, a lightweight method that combines (1) rank-based parameter resets to restore plasticity with (2) weight averaging to enhance stability. Validated on both class-incremental learning with image classifiers and continual instruction tuning with LLMs, our approach outperforms strong baselines while matching the low computational cost of replay, offering a scalable alternative to expensive full-retraining. These findings challenge long-standing CL assumptions and establish a new, cost-efficient baseline for real-world CL systems where exemplar memory is no longer the limiting factor.

Abstract PDF Upgrade to Chat

Summary

The paper’s main contribution is the introduction of Weight Space Consolidation, which rebalances model plasticity and stability using rank-based parameter reset and weight averaging.
It achieves competitive accuracy on benchmarks like CIFAR-100 by leveraging abundant memory to reduce GPU time compared to traditional continual learning methods.
The approach redefines continual learning by shifting focus from minimizing memory to enhancing plasticity, paving the way for computationally efficient retraining strategies.

"Forget Forgetting: Continual Learning in a World of Abundant Memory" - An Authoritative Summary

Introduction

In the domain of Continual Learning (CL), researchers have conventionally emphasized minimizing exemplar memory to reduce catastrophic forgetting when models are exposed to novel data. This paper, titled "Forget Forgetting: Continual Learning in a World of Abundant Memory," challenges this conventional paradigm by proposing a more pragmatic approach. Rather than focusing on reducing memory usage, the researchers assert that in modern systems, GPU time is a significant bottleneck. Therefore, the study investigates scenarios where memory is more abundant, allowing forgetting to be mitigated, but retraining from scratch remains too costly. This shift necessitates methods that focus less on stability and more on enhancing plasticity, which is the flexibility to learn new tasks. This paper proposes Weight Space Consolidation, a technique addressing this trade-off by enhancing model plasticity without compromising stability, highlighting it as a scalable alternative to costly retraining methods.

Stability vs. Plasticity Trade-off

The paper dives into the stability-plasticity dilemma, which describes the tension between retaining learned knowledge (stability) and adapting to new information (plasticity). With abundant memory, stable replay strategies become feasible, reducing the risk of catastrophic forgetting. Yet, the challenge transitions to enhancing plasticity, as models often struggle to learn new tasks due to a bias towards previously learned information.

The authors propose a regime where memory sufficiency shifts the focus from avoiding catastrophic forgetting to addressing new challenges—retaining plasticity without compromising the model’s stability. Notably, reduced forgetting leads to improved stability, allowing for simpler methods like replay to surpass state-of-the-art benchmarks at reduced GPU costs.

Weight Space Consolidation

Central to this research is the proposed Weight Space Consolidation method. It involves two critical operations:

Rank-based Parameter Reset: Dormant parameters, which have lesser significance in learning the current task, are periodically reset to restore plasticity. This is achieved by evaluating parameter importance using gradient-based signal accumulation, allowing for selective resets that enhance adaptability while preserving stability.
Weight Averaging: This technique maintains an average of model weights over time, facilitating convergence towards flatter and more robust optima. Weight averaging mitigates the risk of overfitting to specific training data, ensuring stability even as the model continuously learns.

Such operations are computationally lightweight and resemble post hoc model mergers but are applied actively during training. The result is a method that consolidates the model’s ability to adapt to new tasks while maintaining learned knowledge, achieving competitive accuracy with significantly reduced computational demands.

Empirical Validation

The paper validates its approach across various benchmarks, including class-incremental learning scenarios with datasets like CIFAR-100 and continual instruction tuning with LLMs. The findings clearly demonstrate that Weight Space Consolidation not only achieves consistent improvements over standard replay baselines but also maintains competitive accuracy compared to more complex state-of-the-art methods. Notably, it achieves these gains while significantly reducing computational overhead, making it an attractive solution in real-world applications where memory is less of a constraint than computational resources.

Implications and Future Directions

The implications of this research are profound for the field of CL, advocating for a shift from optimizing under unrealistic memory limitations towards designing computationally efficient algorithms suitable for current and future AI deployments. By re-evaluating the cost structures of CL practices, this paper paves the way for techniques that are both effective and efficient, particularly in environments where GPU resources are the primary limitation rather than memory.

In conclusion, "Forget Forgetting: Continual Learning in a World of Abundant Memory" challenges existing assumptions in CL, offering a pragmatically oriented method that harmonizes stability and plasticity under new practical constraints. Future research may further explore the integration of such techniques with adaptive learning rates and dynamic architectures, continuing to refine the balance between model stability and adaptability in an ever-evolving data landscape.