Streaming Federated Continual Learning

Updated 3 February 2026

Streaming Federated Continual Learning is a framework that combines federated and continual learning to collaboratively train models on evolving, non-IID data streams.
It addresses challenges such as spatio-temporal heterogeneity and catastrophic forgetting using methods like replay-based buffers, gradient matching, and analytic solutions.
SFCL employs synchronous and asynchronous protocols with adaptive buffering to optimize the stability-plasticity trade-off while reducing communication and computation overhead.

Streaming Federated Continual Learning (SFCL) is a paradigm that unifies the challenges of federated learning (FL) and continual learning (CL) under the constraint of data streams: multiple distributed clients receive non-stationary, possibly non-IID data, in a temporally evolving manner, and must collaboratively learn a global model that generalizes across all seen tasks without catastrophic forgetting. SFCL explicitly models spatio-temporal heterogeneity—statistical variation across clients (spatial) and tasks (temporal)—as data and label distributions shift arbitrarily over time and across the federation. This setting subsumes and extends batch-based FCL, introducing significant new algorithmic and theoretical challenges.

1. Formal Setting and Problem Statement

The SFCL setting assumes a federation of $U$ clients. Each client $u\in\{1,\dots,U\}$ observes its own private online sequence of datasets $\{\mathcal{D}_u^t\}_{t=1}^{T_u}$ , where each $\mathcal{D}_u^t$ may cover novel or overlapping classes, may not be temporally or semantically ordered, and, under the Limitless Task Pool (LTP) assumption, has no guaranteed relation to other clients' streams or temporal orderings (Nguyen et al., 22 May 2025). In each federated communication round, clients can access only the most recent batch of stream data, with past samples discarded or, at best, sparsely buffered.

The global objective is to minimize the cumulative risk for all seen tasks and clients, under strict privacy and single-pass storage constraints: $\min_{\theta^T} \;\frac{1}{U}\sum_{u=1}^U\sum_{i=1}^T\mathcal{L}(\theta^T;\mathcal{D}_u^i)$ where direct access to all historical data is infeasible due to privacy, storage, and streaming protocol.

Key elements:

Spatial heterogeneity: each client’s data distribution $p_u(x,y)$ can be arbitrary and often highly non-IID.
Temporal heterogeneity: task sequence and class distributions for each client are non-stationary.
Catastrophic forgetting: compounded across temporal (per-client) and spatial (global) axes (Yang et al., 2023).

2. Principal Algorithmic Frameworks

Synchronous and Asynchronous Protocols

Most SFCL frameworks instantiate either synchronous or asynchronous protocols for streaming tasks (Yang et al., 2023):

Synchronous SFCL: all clients jointly increment a common task index, e.g., each global communication round corresponds to completion of a new task on all clients.
Asynchronous SFCL: clients may proceed through their local task streams at independent rates, with the central server aggregating updates opportunistically; suitable for dynamic federations and partial client participation.

Core Methodological Classes

Extant SFCL methods fit into several broad algorithmic categories:

Method Class	Mechanism	Typical Examples
Replay-based	Buffer or generator for old examples/pseudo-examples	Experience Replay (Serra et al., 2024), Generative Replay (Qi et al., 2023), Buffer Gradient Projection (Dai et al., 2024), C-FLAG (Keshri et al., 2024)
Gradient Matching	Align gradients across tasks/clients	STAMP (Nguyen et al., 22 May 2025), Fed-A-GEM (Dai et al., 2024)
Parameter/Adapter	Decompose or freeze subnetworks, use adapters	LoRA/ViT Adapters (DOLFIN (Moussadek et al., 15 Oct 2025)), FedWeIT, parameter isolation
Regularization	Knowledge distillation, EWC, SI, or KL penalties	FedBNN (Yao et al., 2024), SMCF (Fed-LSCL (Yu et al., 13 Aug 2025))
Analytic/Closed-form	Gradient-free, recursive least-squares solutions	AFCL (Tang et al., 18 May 2025)
Knowledge Fusion & Distillation	Aggregate prototypes, heads, or adapter weights	OBO Distillation (Fed-LSCL (Yu et al., 13 Aug 2025)), Prototype fusion (Yang et al., 2023)
Model-level Replay	Send models for rehearsal on historic data	FedRewind (Palazzo et al., 2024)

Extensions include one-shot distillation for continual segmentation (Peng et al., 19 Mar 2025), streaming federated recommendation (Lim et al., 6 Aug 2025), and Bayesian continual learning (Yao et al., 2024).

3. Core Challenges and Catastrophic Forgetting

Spatial-temporal catastrophic forgetting is a central obstacle in SFCL (Yang et al., 2023). This encompasses both:

Temporal catastrophic forgetting (TemporalCF): knowledge erasure within a client as it processes new tasks, unable to revisit old data.
Spatial catastrophic forgetting (SpatialCF): loss of client-specific specialization or global generalization when aggregating heterogeneous updates, as global model quality on past tasks decays due to inter-client conflicts.

Evaluation criteria typically include:

Accuracy/Knowledge Retention: averaged over all tasks/classes at each round.
Average Forgetting (AF): mean drop in task accuracy from acquisition to the end of stream.
Communication and Computation Costs: overhead induced per client and per communication.

Emphasis is placed on designs that optimize the stability–plasticity trade-off: maximal preservation of prior knowledge (stability) with strong adaptation to non-stationary data (plasticity).

4. Representative Methods and Advances

Gradient Matching and Coreset Prototypes

STAMP employs spatio-temporal gradient matching: at the client, it aggregates and aligns gradients from current and historical data (or their prototypical summaries), while at the server, it solves an optimization for the invariant direction among aggregated client updates. Prototypical coresets are maintained per class as stable, memory-efficient surrogates for prior knowledge, and are selected via a convex program to preserve class means (Nguyen et al., 22 May 2025). This removes dependence on generative models and reduces both communication and storage requirements.

STAMP achieves superior average accuracy and lower forgetting than both classic federated and generative replay FCL baselines, especially under strong non-IID (Nguyen et al., 22 May 2025).

Replay, Buffer, and Gradient Constraints

Buffer-based methods maintain explicit, class-balanced, or uncertainty-selected buffers (e.g., using Bregman Information) (Serra et al., 2024), or use projection constraints (Fed-A-GEM) to ensure updates do not harm historical performance (Dai et al., 2024). C-FLAG (Keshri et al., 2024) combines incremental gradient aggregation with fixed-size episodic memory to optimize stability/plasticity with $O(1/\sqrt{T})$ non-convex convergence, representing a theoretically grounded buffer-based replay mechanism.

Large–Small Model Collaboration

Fed-LSCL proposes a composite architecture: each client pairs a frozen Foundation Model with a mutable, lightweight small model that generates LoRA-style adapters for personalized continual adaptation. Continual knowledge is preserved with feature-level consistency loss and personalized head distillation. The server aggregates only the small adapter generators, enabling robust sharing across heterogeneous client architectures (Yu et al., 13 Aug 2025).

Closed-form and Analytic Solutions

AFCL bypasses the sensitivity of gradient-based approaches by employing feature extraction with a frozen backbone and recursive closed-form least-squares classifiers, updating global parameters in a single communication round. The method ensures spatio-temporal and order invariance, exactly matching joint centralized learning regardless of data arrangement (Tang et al., 18 May 2025).

Bayesian Streaming

FedBNN uses variational Bayesian inference with global mean-field priors/posteriors, factorizing all local updates into Gaussian likelihoods and integrating information from all present and historic distributions. This method provides formally motivated mitigation of forgetting without explicit task boundary identification and excels in both class-incremental and domain-incremental regimes (Yao et al., 2024).

Efficient Distillation and Asynchronicity

One-shot distillation approaches, such as MMDS for continual segmentation (Peng et al., 19 Mar 2025), aggregate models from streaming clients/distinct tasks using public unlabelled data and pseudo-label aggregation, performing knowledge fusion via distillation only once per streaming stage.

Asynchronous online FL frameworks permit client updates to synchronize at variable frequencies, leveraging continual-proximal regularization and attention-style central feature re-weighting for robust convergence in the presence of device/communication heterogeneity and lag (Chen et al., 2019).

5. Evaluation, Metrics, and Practical Benchmarks

Common experimental protocols reflect real-world streaming scenarios:

Dimension	Options/Settings
Task Sequencing	Sequential (IID/non-IID), overlapping, class-incremental, orderless (LTP)
Datasets	CIFAR-10, CIFAR-100, ImageNet(-R, -A, 1K), Tiny-ImageNet, EMNIST, medical images
Client Distribution	Dirichlet( $\alpha$ ) splits, disjoint class labels, sample imbalance
Communication	Frequent (every few mini-batches), infrequent (tasks/blocks), asynchronous
Privacy	No sharing of raw data; only partial, sometimes compressed, model/parameter exchange

Metrics include:

Per-task and average accuracy
Average forgetting (AF)
Knowledge retention (temporal $KR_t$ , spatial $KR_s$ ) (Yang et al., 2023)
Communication and memory footprint
Convergence rate and latency

6. Outstanding Issues and Future Directions

Notable open problems and frontiers include:

Unified streaming benchmarks: There is a scarcity of standard, diverse, real-world SFCL testbeds reflecting true non-IID, streaming, partial participation, and dynamic task evolution.
Task-free, identifierless streaming: Methods such as FedKACE support category overlap and absent task identifiers, employing adaptive model switching, kernel-spectral buffer maintenance, and gradient-balanced replay. These approaches deliver regret bounds showing federated aggregation always improves over local learning as coverage grows (Tan et al., 27 Jan 2026).
Communication/Computation Efficiency: Single-round distillation (Peng et al., 19 Mar 2025), lightweight adapter exchange (Moussadek et al., 15 Oct 2025), and closed-form aggregation (Tang et al., 18 May 2025) present promising directions for reducing practical barriers to deployment.
Privacy and security: Deepening integration of communication-efficient secure aggregation, differential privacy, and formal privacy-utility trade-offs.
Modality and Application Expansion: Extending to text, recommendation (Lim et al., 6 Aug 2025), segmentation, detection, and multimodal streaming FCL.
Theoretical understanding: While select methods have provable invariance/convergence (e.g., AFCL (Tang et al., 18 May 2025), C-FLAG (Keshri et al., 2024)), uniform theoretical guarantees under asynchronous, non-identifiable, and high-dimensional streaming are still rare.
Adaptive buffering and continual hyperparameter tuning: Effective single-pass, bufferless, or generative continual replay for arbitrarily shaped data/label distributions remains an important area.

7. Summary Table: Selected Streaming FCL Methods

Method / Paper	Core Idea	Forgetting Mitigation Approach	Notable Results / Remarks
STAMP (Nguyen et al., 22 May 2025)	Spatio-Temporal GM + Prototypical Coreset	Gradient matching via prototypes	Best AF vs. system cost, non-IID robust
FedRewind (Palazzo et al., 2024)	Decentralized model-level rehearsal	Periodic "rewind" to origin client	~3–4pp accuracy gain over baselines
Fed-LSCL (Yu et al., 13 Aug 2025)	Large–small model adapters, OBO Distill.	Adapter-level feature/weight distill.	97% acc. on CIFAR-100 (vs. 80% best prev)
AFCL (Tang et al., 18 May 2025)	Analytic, closed-form, gradient-free	Recursive least squares on frozen feat.	Perfect invariance, 100%+ gain over replay
DOLFIN (Moussadek et al., 15 Oct 2025)	ViT + dual-GPM LoRA, orth. adapters	Gradient-space orthogonality	Best FAA under all Dirichlet splits
Fed-A-GEM (Dai et al., 2024)	Buffer gradient projection shared global	Gradient projection wrt buffer	+27% acc. vs. vanilla baselines (CIFAR100)
FedBNN (Yao et al., 2024)	Variational Bayes global model	Continual Bayesian inference	State-of-art final forgetting mitigation
FedKACE (Tan et al., 27 Jan 2026)	Task-free overlap; kernel buffer select	Adaptive replay; inference switching	Regret bound: global always > local

References

Please refer to the following arXiv papers for comprehensive protocols, formal analysis, and empirical results:

(Nguyen et al., 22 May 2025, Palazzo et al., 2024, Yu et al., 13 Aug 2025, Yang et al., 2023, Serra et al., 2024, Peng et al., 19 Mar 2025, Tang et al., 18 May 2025, Qi et al., 2023, Moussadek et al., 15 Oct 2025, Chen et al., 2019, Tan et al., 27 Jan 2026, Dai et al., 2024, Keshri et al., 2024, Lim et al., 6 Aug 2025, Yao et al., 2024).

This literature collectively establishes SFCL as a distinct, multifaceted research area, with stability–plasticity trade-off, communication/memory constraints, and robustness to extreme heterogeneity at its methodological and theoretical core.