Clustered Cross-Covariance TD Control (C⁴)
- The paper introduces Clustered Cross-Covariance Control (C⁴), a novel continual learning policy that allocates buffer resources based on class vulnerability in TD learning.
- It employs proxy models to compute class-wise confidence variations, enabling adaptive sample selection and improved memory management.
- Empirical results demonstrate accuracy gains of 1–3% and reduced forgetting by targeting volatile samples, aligning with multiscale replay strategies in Markovian settings.
Clustered Cross-Covariance Control for TD (C⁴) refers to a class of continual learning policies that utilize partitioned or "clustered" sampling strategies to allocate memory buffer resources based on class-wise cross-covariance or related measures of learning dynamics, particularly in the context of temporal difference (TD) learning and experience replay schemes with non-i.i.d. data. This approach underpins recently proposed algorithms such as the Class-Adaptive Sampling Policy (CASP), which adapts buffer allocation to the vulnerability and contribution of classes, and is conceptually related to multiscale, non-uniform replay strategies developed for Markovian settings.
1. Motivation and Context
Continual learning (CL) addresses the challenge of maintaining the performance of learning systems as they acquire new knowledge sequentially, avoiding the catastrophic forgetting of earlier tasks or classes. Buffer-based rehearsal methods have been dominant in this domain, relying on storing and replaying a limited memory of past samples. However, a key issue remains: how best to allocate the finite buffer among diverse, possibly imbalanced or complex data classes, especially when class boundaries or sample influence is non-uniform and the data arrive in dependent sequences, as in TD learning.
The concept of clustered or partitioned control emerges as a principled solution: instead of uniformly storing and replaying samples irrespective of their provenance or representational difficulty, the system adapts the storage and replay probabilities to class-dependent measures of vulnerability, difficulty, or sample informativeness. This can be formalized via cross-covariance or variance-based metrics reflecting the temporal confidence fluctuations or sample-specific learning trajectories observed during proxy model training (Rezaei et al., 2023).
2. Class-Wise Vulnerability Scoring and Buffer Partitioning
The CASP methodology quantifies "class vulnerability" by training a proxy model on the most recent task data , recording the average SoftMax confidence in the true label per class across epochs. The key statistics for class are:
- Class-confidence at epoch :
- Mean confidence:
- Vulnerability (std. dev. of confidences):
Classes with more volatile confidence curves—frequently those prone to forgetting—are deemed more vulnerable. The additional buffer slots for task are then partitioned among the classes proportionally:
where is rounded to ensure integer allocation summing to .
3. Sample-Level Selection within Class Partitions
Within each class, sample-level vulnerability is similarly determined using SoftMax confidence trajectories, with sample scored as:
For each class , the samples with highest are selected for entry into the buffer. This results in a two-stage, clustered sampling probability:
- Class-selection:
- Sample-selection within class:
The practical policy is to deterministically select top-vulnerability samples until each class allocation is filled (Rezaei et al., 2023).
4. Integration with Experience Replay and Algorithmic Workflow
The clustered cross-covariance policy integrates into experience replay (ER) as follows (Algorithm 1 in (Rezaei et al., 2023)):
- Initialization: Buffer , model weights randomized.
- Online Phase (for each batch): Draw a random replay batch from , update main model on current and replayed samples, tentatively insert new batch into via reservoir sampling.
- Partitioned Update (end of task):
- Compute class vulnerabilities.
- Partition new buffer slots accordingly.
- For each class, select top-vulnerability samples.
- Insert partitioned samples; if buffer exceeds its fixed size, discard surplus.
This mechanism replaces the uniform buffer update at the end of each task—the standard ER policy—with a class/cluster-wise, vulnerability-adaptive selection (Rezaei et al., 2023).
5. Empirical Rationale and Theoretical Justification
Empirical findings demonstrate several advantages for clustered-buffer strategies:
- Data Cartography Effect: Samples with fluctuating confidence (middle-variance) are those most likely to be forgotten and represent strong class structure; their selection boosts test accuracy by 4–6 points compared to easiest or hardest subsets in offline settings (CIFAR10) (Rezaei et al., 2023).
- Class-Level Forgetting Correlation: A Pearson is observed between class vulnerability and forgetting, confirming that higher-variance classes lose more information under uniform replay (Rezaei et al., 2023).
- End-to-End Gains: Integrating CASP with rehearsal algorithms (ER, MIR, SCR, DVC, PCR) yields +1–3 percentage points in accuracy and –1–3 in forgetting across buffer capacities and benchmarks such as Split CIFAR100 and Split Mini-ImageNet.
- t-SNE Visualization: Standard ER yields buffers with over- or under-represented classes and outliers, while class-adaptive policies give better support coverage, particularly of informative decision boundaries (Rezaei et al., 2023).
The underlying theoretical justification recognizes that partitioning the buffer as a function of class-wise cross-variance targets the root cause of representation forgetting in continual learning.
6. Connections to Multiscale Replay and Markovian Settings
The clustered cross-covariance paradigm is conceptually linked to multiscale replay strategies in TD and Markovian buffer scenarios. The Multiscale Experience Replay (MER) approach partitions the buffer into geometric scales, emulating i.i.d. sampling at coarse scales (where Markov chain states are near-independent) and refining at finer scales as mixing dependencies emerge. The MER policy does not require knowledge of the chain mixing time, unlike fixed sample-skipping, and adaptively exploits independence where possible, matching the stochastic error rate of i.i.d. stochastic approximation (Nakul et al., 4 Jan 2026).
Key correspondences include:
- Partitioned Sampling: Both CASP and MER replace uniform buffer sampling with a structured, scale- (or class-)adaptive procedure.
- Dependency Control: The adjustment of replay probabilities, whether by class-variance or by scale, mitigates statistical inefficiency due to correlated samples.
- Theoretical Guarantees: MER can provably achieve the i.i.d. convergence rate without tuning to Markov chain parameters (Nakul et al., 4 Jan 2026). By analogy, CASP's class-wise rebalancing directly targets empirically validated sources of forgetting in continual learning.
A plausible implication is that extensions of clustered cross-covariance control could further harmonize buffer-based continual learning and TD-policy evaluation under Markovian data, yielding robust replay schemes suited for non-i.i.d. sequential tasks across multiple domains.