Co$^2$L: Contrastive Continual Learning

Published 28 Jun 2021 in cs.LG and cs.CV | (2106.14413v1)

Abstract: Recent breakthroughs in self-supervised learning show that such algorithms learn visual representations that can be transferred better to unseen tasks than joint-training methods relying on task-specific supervision. In this paper, we found that the similar holds in the continual learning con-text: contrastively learned representations are more robust against the catastrophic forgetting than jointly trained representations. Based on this novel observation, we propose a rehearsal-based continual learning algorithm that focuses on continually learning and maintaining transferable representations. More specifically, the proposed scheme (1) learns representations using the contrastive learning objective, and (2) preserves learned representations using a self-supervised distillation step. We conduct extensive experimental validations under popular benchmark image classification datasets, where our method sets the new state-of-the-art performance.

Abstract PDF Upgrade to Chat

Citations (263)

View on Semantic Scholar

Summary

The paper introduces a novel rehearsal-based framework using asymmetric supervised contrastive loss and self-distillation to mitigate catastrophic forgetting.
The method achieves state-of-the-art performance, with a notable 22.40% accuracy improvement on the Seq-CIFAR-10 benchmark.
The study outlines future research directions for optimizing hyperparameters and extending contrastive techniques to unsupervised and semi-supervised scenarios.

Overview of CoCoA: Contrastive Continual Learning

The paper "CoCoA: Contrastive Continual Learning" presents a novel approach to addressing the challenge of catastrophic forgetting in continual learning domains by leveraging self-supervised contrastive learning techniques. The motivation arises from recent findings indicating that representations learned through self-supervised methods, such as contrastive learning, exhibit superior transferability and robustness compared to those trained jointly with task-specific supervision. Such robustness is particularly valuable in continual learning, where the primary challenge lies in maintaining previously acquired knowledge when exposed to new tasks.

The authors introduce a rehearsal-based continual learning algorithm, Co $^2$ L (Contrastive Continual Learning), which focuses on learning transferable representations and preserving them through a dedicated self-supervised distillation process. The key contributions of the Co $^2$ L framework include:

Asymmetric Supervised Contrastive (Asym SupCon) Loss: This novel loss function is designed to address continual learning setups by adapting supervised contrastive learning methods. It contrasts samples from the current task against past samples, effectively leveraging limited negative samples in memory buffers and the current task.
Instance-wise Relation Distillation (IRD): Co $^2$ L introduces a self-distillation technique to preserve the learned representations. IRD minimizes the instance-level similarity drift between current and past models, ensuring learned features remain stable as new data is introduced.

Extensive experimentation across popular image classification benchmarks demonstrates the efficacy of Co $^2$ L. The proposed method consistently achieves state-of-the-art performance across diverse experimental setups, significantly reducing the effects of catastrophic forgetting when compared to baseline methods. Notably, Co $^2$ L yields a 22.40% improvement in accuracy on the Seq-CIFAR-10 dataset with the use of IRD and buffered samples. Such results highlight the importance of designing continual learning algorithms that not only shield representations from forgetting but also optimize for transferability to future tasks.

Implications and Future Directions

The introduction of contrastive methodologies in continual learning, as proposed by this paper, opens a new avenue for exploring self-supervised approaches in varied machine learning contexts. By focusing on transferable knowledge rather than task-specific features, Co $^2$ L aligns the learning process closer to the ideal of human-like learning capabilities. The study emphasizes the relevance of representation quality in mitigating forgetting and suggests that a decoupled approach to representation and classifier training can be fruitful.

While Co $^2$ L showcases remarkable improvements, several aspects warrant further exploration. Among these are investigations into optimizing IRD hyperparameters for various domains, examining potential computational overheads associated with the distillation process, and experimenting with different neural architectures or additional datasets beyond image classification tasks. Furthermore, advancing these contrastive learning techniques under unsupervised or semi-supervised scenarios could enhance their applicability across a broader array of real-world conditions.

In conclusion, Co $^2$ L marks a significant step forward in continual learning, providing a scalable and effective solution for representation retention and transferability. As machine learning systems continue to evolve, leveraging self-supervised frameworks like Co $^2$ L will likely become pivotal in designing future adaptive, enduring models, capable of learning continuously amidst growing complexity.