Multi-Domain Experience Replay

Updated 25 December 2025

The approach mitigates catastrophic forgetting by interleaving new samples with replayed data using techniques like Bayesian uncertainty and embedding-based clustering.
It employs domain-sensitive strategies such as MK-MMD to optimize buffer management and sampling, enhancing both adaptation and retention.
Empirical evaluations across supervised, unsupervised, and reinforcement learning settings demonstrate improved performance and reduced forgetting.

Multi-domain experience replay is a class of strategies within continual learning, reinforcement learning, and domain adaptation that addresses the challenge of catastrophic forgetting when a learner must sequentially adapt to data from distinct domains or tasks. This paradigm maintains a buffer of past experiences (i.e., data samples, trajectories, or feature embeddings) drawn from previously encountered domains and systematically reuses them during ongoing updates with new, potentially out-of-distribution data. Recent empirical and theoretical work elucidates optimized sampling methods, buffer management policies, and domain-sensitive prioritization mechanisms that collectively enhance knowledge retention and accelerate adaptation across varied domains (Yang et al., 2022, Xu et al., 23 Jun 2025, Rostami, 2024, Tirumala et al., 2023).

1. Core Concepts and Problem Setting

The multi-domain experience replay paradigm arises in settings where either: (a) a supervised or unsupervised learner encounters sequential domains with input or task distribution shift, (b) an edge/cloud learning architecture faces non-stationary environments, or (c) off-policy reinforcement learning is deployed across variable environments or experiments. In all cases, naively training on the current domain leads to catastrophic forgetting of prior knowledge. Experience replay mitigates this by maintaining a buffer of past examples for interleaved or prioritized rehearsal during ongoing training, thus stabilizing parametric updates and improving both forward and backward transfer.

Formally, given a series of domains $\mathcal{D}_1, \ldots, \mathcal{D}_T$ , the algorithm maintains a memory buffer $\mathcal{B}$ with a constraint $|\mathcal{B}| \leq N$ . At each update, a mini-batch is constructed by mixing new samples from the current domain and replayed samples from prior domains, where selection may be random, diversity-driven, or domain-importance-weighted. This design is adapted for both supervised continual learning (class-incremental, domain-incremental) (Yang et al., 2022), unsupervised domain adaptation (Rostami, 2024), edge-cloud adaptation scenarios (Xu et al., 23 Jun 2025), and off-policy RL (Tirumala et al., 2023).

2. Replay Buffer Construction and Sampling Mechanisms

Replay buffer management is a dominant axis of method variation. The following sampling strategies are prominent in supervised continual learning contexts (Yang et al., 2022):

Random Sampling (R): Candidate buffer samples are drawn uniformly at random, simple but ignores task or domain information.
Confidence, Entropy, and Margin-Based Scoring (C/H/M): Samples are scored according to model prediction statistics—maximum class probability, output entropy, or the difference between largest and smallest predicted class probabilities—with the option to select "difficult" (high-entropy/low-confidence) or "simple" (low-entropy/high-confidence) exemplars.
Bayesian Disagreement (B): Dropout or similar methods approximate predictive uncertainty for each candidate, enabling selection of samples where epistemic uncertainty is maximal.
Embedding-Based Clustering (K-means/Core-set): Candidate features are clustered or greedily maximized for coverage in embedding space, targeting representational diversity.
Maximally Interfered Retrieval (MIR): Buffer samples whose losses increase most under a virtual new-task update are preferentially replayed.

In multi-domain reinforcement learning and continual domain adaptation, buffer update and selection additionally consider explicit domain dissimilarity. For instance, in edge model adaptation, new buffer entries are selected per-domain using a domain quota, and domain distance is quantified via a multi-kernel MMD (MK-MMD) to enable discrepancy-weighted prioritization of historical domains (Xu et al., 23 Jun 2025). In unsupervised continual domain adaptation, representative samples are chosen per class based on their proximity to current Gaussian mixture components in feature space (“mean-of-features” selection) (Rostami, 2024).

3. Domain Discrepancy Quantification and Prioritized Replay

Domain-sensitive replay leverages measures of inter-domain divergence to optimize both buffer composition and replay batch formation. In ER-EMU (Xu et al., 23 Jun 2025), a multi-kernel MMD between the current domain's empirical distribution and each historical domain's buffer partition serves as a dissimilarity metric. At each update, mini-batches sample from the $l$ most MK-MMD-distant domains, and loss terms are weighted by a sigmoid of the discrepancy score to emphasize retention of knowledge from outlying, underrepresented domains. This approach demonstrably outperforms random replay in persistent, cyclically varying environments such as traffic monitoring under day/night cycles.

Empirical evidence confirms the robustness of this strategy: ablations replacing MK-MMD with pure random selection reduce mean average precision by 1–2 points, while the method remains insensitive to $l$ (number of replayed domains per batch) in the range 1–10 (Xu et al., 23 Jun 2025).

4. Integration with Learning Objectives and Theoretical Guarantees

Multifaceted loss functions integrate replay mechanisms with objectives reflecting both current adaptation and knowledge preservation. In continual unsupervised domain adaptation (Rostami, 2024), the total loss comprises:

A classification loss on pseudo-samples drawn from an up-to-date internal Gaussian mixture model (GMM) of feature distributions.
A replayed-sample classification loss covering buffer contents from all past domains.
Sliced Wasserstein distance (SWD) terms encouraging alignment of current and replay buffer feature representations with the internal GMM.

Theoretical analysis shows that the expected error for the current domain is bounded by contributions from UDA alignment, accumulated distributional shifts (mitigated by buffer updates), and the inverse of the buffer/pseudo-sample sizes. Catastrophic forgetting is thus directly controlled by replay buffer size and alignment fidelity.

Practical implementations allocate the buffer equally per class (“mean-of-features”) or per domain (FIFO, random-sampling-based updates), and regulate per-batch mixing ratios to balance new and old samples.

5. Empirical Evaluation and Cross-Domain Performance

Comprehensive benchmarks demonstrate the impact of both buffer management and sampling policy across domains and protocols:

Dataset/Protocol	Best Acc / mAP	Best Forgetting	Notes
MNIST (class-inc)	0.876 (Margin/Bayes)	0.096 (Core-set)	All methods similar; task simple
CIFAR-10 (class-inc)	0.409 (MIR)	0.221 (Bayes)	Difficult replay consistently best
MiniImagenet (class-inc)	0.158 (Bayes)	0.280 (Bayes)	Bayesian strat. best at high cost
OpenLORIS (dom-inc)	0.966 (Ent/Conf)	0.014 (Ent)	Large $N_c$ ; Entropy optimal
Bellevue Traffic (obj det.)	63.7 (DCC+ER-EMU)	–	MK-MMD stratification boosts performance

Key findings include:

Difficult sample replay (high-entropy/low-confidence) is superior to simple sample replay for class-incremental tasks.
Diversity-oriented replay is more critical as domain shift increases.
Generative replay (e.g., VAE-based) is inferior to experience replay when real-data storage is available.
Buffer size and per-domain quota trade off memory against knowledge retention; larger buffers yield systematic forgetting reduction ( $\sim$ 2-3% per doubling on digit recognition), but benefits saturate.
In domain-incremental settings, replaying data from more “distant” domains (per MK-MMD) enhances both adaptation and retention (Xu et al., 23 Jun 2025).
In RL, fixed-ratio mixing (e.g., $\alpha = 0.5$ between offline and online buffers) delivers robust performance gains, especially under data-scarce or domain-shift conditions (Tirumala et al., 2023).

6. Practical Recommendations and Hyperparameter Guidelines

Cross-domain experience replay entails configuration choices grounded in empirical evaluation:

Use experience replay over generative replay when possible (memory cost permitting).
For small-to-moderate class counts, Bayesian uncertainty-based strategies minimize forgetting; for high-class-counts tasks, entropy/confidence predictions are computationally preferable.
Embedding-based clustering is computationally intensive and offers limited accuracy gains except when explicit embedding diversity is required.
For edge/cloud adaptation, partition buffer capacity per domain, update with random sampling, and set domain selection to top- $l$ most MK-MMD-distant domains, weighting loss terms accordingly (Xu et al., 23 Jun 2025).
In reinforcement learning, persistently aggregate all prior experiment experience in a cumulative buffer and mix new/old samples at a fixed ratio ( $\alpha = 0.5$ robust in most domains); prioritize buffer resets for network weights but not experience storage (Tirumala et al., 2023).
Replay “difficult” samples in class-incremental learning; in domain-incremental, prefer early presentation of “easy” domains.
Buffer and batch size, kernel choice (for MK-MMD), and replay ratio should be tuned within empirically validated ranges (e.g., $3$–$5$ kernels, domain batch ratio $l=5$ ).

7. Limitations and Outlook

While multi-domain experience replay techniques deliver significant gains in continual learning and adaptation, their efficacy depends on the memory-budget, representational overlap of domains, and precise measurement of domain discrepancy. Computational overhead for complex sampling (e.g., Bayesian, MIR, K-means) can be prohibitive in large-scale problems; for many practical settings, simple entropy- or confidence-based strategies suffice. Future work aims to unify dynamic buffer allocation, optimize discrepancy metrics for high-dimensional distributions, and develop architectural extensions that support replay with minimal storage, particularly in resource-constrained deployments (Yang et al., 2022, Xu et al., 23 Jun 2025, Rostami, 2024, Tirumala et al., 2023).