Streaming Federated Continual Learning
- Streaming Federated Continual Learning is a framework that combines federated and continual learning to collaboratively train models on evolving, non-IID data streams.
- It addresses challenges such as spatio-temporal heterogeneity and catastrophic forgetting using methods like replay-based buffers, gradient matching, and analytic solutions.
- SFCL employs synchronous and asynchronous protocols with adaptive buffering to optimize the stability-plasticity trade-off while reducing communication and computation overhead.
Streaming Federated Continual Learning (SFCL) is a paradigm that unifies the challenges of federated learning (FL) and continual learning (CL) under the constraint of data streams: multiple distributed clients receive non-stationary, possibly non-IID data, in a temporally evolving manner, and must collaboratively learn a global model that generalizes across all seen tasks without catastrophic forgetting. SFCL explicitly models spatio-temporal heterogeneity—statistical variation across clients (spatial) and tasks (temporal)—as data and label distributions shift arbitrarily over time and across the federation. This setting subsumes and extends batch-based FCL, introducing significant new algorithmic and theoretical challenges.
1. Formal Setting and Problem Statement
The SFCL setting assumes a federation of clients. Each client observes its own private online sequence of datasets , where each may cover novel or overlapping classes, may not be temporally or semantically ordered, and, under the Limitless Task Pool (LTP) assumption, has no guaranteed relation to other clients' streams or temporal orderings (Nguyen et al., 22 May 2025). In each federated communication round, clients can access only the most recent batch of stream data, with past samples discarded or, at best, sparsely buffered.
The global objective is to minimize the cumulative risk for all seen tasks and clients, under strict privacy and single-pass storage constraints: where direct access to all historical data is infeasible due to privacy, storage, and streaming protocol.
Key elements:
- Spatial heterogeneity: each client’s data distribution can be arbitrary and often highly non-IID.
- Temporal heterogeneity: task sequence and class distributions for each client are non-stationary.
- Catastrophic forgetting: compounded across temporal (per-client) and spatial (global) axes (Yang et al., 2023).
2. Principal Algorithmic Frameworks
Synchronous and Asynchronous Protocols
Most SFCL frameworks instantiate either synchronous or asynchronous protocols for streaming tasks (Yang et al., 2023):
- Synchronous SFCL: all clients jointly increment a common task index, e.g., each global communication round corresponds to completion of a new task on all clients.
- Asynchronous SFCL: clients may proceed through their local task streams at independent rates, with the central server aggregating updates opportunistically; suitable for dynamic federations and partial client participation.
Core Methodological Classes
Extant SFCL methods fit into several broad algorithmic categories:
| Method Class | Mechanism | Typical Examples |
|---|---|---|
| Replay-based | Buffer or generator for old examples/pseudo-examples | Experience Replay (Serra et al., 2024), Generative Replay (Qi et al., 2023), Buffer Gradient Projection (Dai et al., 2024), C-FLAG (Keshri et al., 2024) |
| Gradient Matching | Align gradients across tasks/clients | STAMP (Nguyen et al., 22 May 2025), Fed-A-GEM (Dai et al., 2024) |
| Parameter/Adapter | Decompose or freeze subnetworks, use adapters | LoRA/ViT Adapters (DOLFIN (Moussadek et al., 15 Oct 2025)), FedWeIT, parameter isolation |
| Regularization | Knowledge distillation, EWC, SI, or KL penalties | FedBNN (Yao et al., 2024), SMCF (Fed-LSCL (Yu et al., 13 Aug 2025)) |
| Analytic/Closed-form | Gradient-free, recursive least-squares solutions | AFCL (Tang et al., 18 May 2025) |
| Knowledge Fusion & Distillation | Aggregate prototypes, heads, or adapter weights | OBO Distillation (Fed-LSCL (Yu et al., 13 Aug 2025)), Prototype fusion (Yang et al., 2023) |
| Model-level Replay | Send models for rehearsal on historic data | FedRewind (Palazzo et al., 2024) |
Extensions include one-shot distillation for continual segmentation (Peng et al., 19 Mar 2025), streaming federated recommendation (Lim et al., 6 Aug 2025), and Bayesian continual learning (Yao et al., 2024).
3. Core Challenges and Catastrophic Forgetting
Spatial-temporal catastrophic forgetting is a central obstacle in SFCL (Yang et al., 2023). This encompasses both:
- Temporal catastrophic forgetting (TemporalCF): knowledge erasure within a client as it processes new tasks, unable to revisit old data.
- Spatial catastrophic forgetting (SpatialCF): loss of client-specific specialization or global generalization when aggregating heterogeneous updates, as global model quality on past tasks decays due to inter-client conflicts.
Evaluation criteria typically include:
- Accuracy/Knowledge Retention: averaged over all tasks/classes at each round.
- Average Forgetting (AF): mean drop in task accuracy from acquisition to the end of stream.
- Communication and Computation Costs: overhead induced per client and per communication.
Emphasis is placed on designs that optimize the stability–plasticity trade-off: maximal preservation of prior knowledge (stability) with strong adaptation to non-stationary data (plasticity).
4. Representative Methods and Advances
Gradient Matching and Coreset Prototypes
STAMP employs spatio-temporal gradient matching: at the client, it aggregates and aligns gradients from current and historical data (or their prototypical summaries), while at the server, it solves an optimization for the invariant direction among aggregated client updates. Prototypical coresets are maintained per class as stable, memory-efficient surrogates for prior knowledge, and are selected via a convex program to preserve class means (Nguyen et al., 22 May 2025). This removes dependence on generative models and reduces both communication and storage requirements.
STAMP achieves superior average accuracy and lower forgetting than both classic federated and generative replay FCL baselines, especially under strong non-IID (Nguyen et al., 22 May 2025).
Replay, Buffer, and Gradient Constraints
Buffer-based methods maintain explicit, class-balanced, or uncertainty-selected buffers (e.g., using Bregman Information) (Serra et al., 2024), or use projection constraints (Fed-A-GEM) to ensure updates do not harm historical performance (Dai et al., 2024). C-FLAG (Keshri et al., 2024) combines incremental gradient aggregation with fixed-size episodic memory to optimize stability/plasticity with non-convex convergence, representing a theoretically grounded buffer-based replay mechanism.
Large–Small Model Collaboration
Fed-LSCL proposes a composite architecture: each client pairs a frozen Foundation Model with a mutable, lightweight small model that generates LoRA-style adapters for personalized continual adaptation. Continual knowledge is preserved with feature-level consistency loss and personalized head distillation. The server aggregates only the small adapter generators, enabling robust sharing across heterogeneous client architectures (Yu et al., 13 Aug 2025).
Closed-form and Analytic Solutions
AFCL bypasses the sensitivity of gradient-based approaches by employing feature extraction with a frozen backbone and recursive closed-form least-squares classifiers, updating global parameters in a single communication round. The method ensures spatio-temporal and order invariance, exactly matching joint centralized learning regardless of data arrangement (Tang et al., 18 May 2025).
Bayesian Streaming
FedBNN uses variational Bayesian inference with global mean-field priors/posteriors, factorizing all local updates into Gaussian likelihoods and integrating information from all present and historic distributions. This method provides formally motivated mitigation of forgetting without explicit task boundary identification and excels in both class-incremental and domain-incremental regimes (Yao et al., 2024).
Efficient Distillation and Asynchronicity
One-shot distillation approaches, such as MMDS for continual segmentation (Peng et al., 19 Mar 2025), aggregate models from streaming clients/distinct tasks using public unlabelled data and pseudo-label aggregation, performing knowledge fusion via distillation only once per streaming stage.
Asynchronous online FL frameworks permit client updates to synchronize at variable frequencies, leveraging continual-proximal regularization and attention-style central feature re-weighting for robust convergence in the presence of device/communication heterogeneity and lag (Chen et al., 2019).
5. Evaluation, Metrics, and Practical Benchmarks
Common experimental protocols reflect real-world streaming scenarios:
| Dimension | Options/Settings |
|---|---|
| Task Sequencing | Sequential (IID/non-IID), overlapping, class-incremental, orderless (LTP) |
| Datasets | CIFAR-10, CIFAR-100, ImageNet(-R, -A, 1K), Tiny-ImageNet, EMNIST, medical images |
| Client Distribution | Dirichlet() splits, disjoint class labels, sample imbalance |
| Communication | Frequent (every few mini-batches), infrequent (tasks/blocks), asynchronous |
| Privacy | No sharing of raw data; only partial, sometimes compressed, model/parameter exchange |
Metrics include:
- Per-task and average accuracy
- Average forgetting (AF)
- Knowledge retention (temporal , spatial ) (Yang et al., 2023)
- Communication and memory footprint
- Convergence rate and latency
6. Outstanding Issues and Future Directions
Notable open problems and frontiers include:
- Unified streaming benchmarks: There is a scarcity of standard, diverse, real-world SFCL testbeds reflecting true non-IID, streaming, partial participation, and dynamic task evolution.
- Task-free, identifierless streaming: Methods such as FedKACE support category overlap and absent task identifiers, employing adaptive model switching, kernel-spectral buffer maintenance, and gradient-balanced replay. These approaches deliver regret bounds showing federated aggregation always improves over local learning as coverage grows (Tan et al., 27 Jan 2026).
- Communication/Computation Efficiency: Single-round distillation (Peng et al., 19 Mar 2025), lightweight adapter exchange (Moussadek et al., 15 Oct 2025), and closed-form aggregation (Tang et al., 18 May 2025) present promising directions for reducing practical barriers to deployment.
- Privacy and security: Deepening integration of communication-efficient secure aggregation, differential privacy, and formal privacy-utility trade-offs.
- Modality and Application Expansion: Extending to text, recommendation (Lim et al., 6 Aug 2025), segmentation, detection, and multimodal streaming FCL.
- Theoretical understanding: While select methods have provable invariance/convergence (e.g., AFCL (Tang et al., 18 May 2025), C-FLAG (Keshri et al., 2024)), uniform theoretical guarantees under asynchronous, non-identifiable, and high-dimensional streaming are still rare.
- Adaptive buffering and continual hyperparameter tuning: Effective single-pass, bufferless, or generative continual replay for arbitrarily shaped data/label distributions remains an important area.
7. Summary Table: Selected Streaming FCL Methods
| Method / Paper | Core Idea | Forgetting Mitigation Approach | Notable Results / Remarks |
|---|---|---|---|
| STAMP (Nguyen et al., 22 May 2025) | Spatio-Temporal GM + Prototypical Coreset | Gradient matching via prototypes | Best AF vs. system cost, non-IID robust |
| FedRewind (Palazzo et al., 2024) | Decentralized model-level rehearsal | Periodic "rewind" to origin client | ~3–4pp accuracy gain over baselines |
| Fed-LSCL (Yu et al., 13 Aug 2025) | Large–small model adapters, OBO Distill. | Adapter-level feature/weight distill. | 97% acc. on CIFAR-100 (vs. 80% best prev) |
| AFCL (Tang et al., 18 May 2025) | Analytic, closed-form, gradient-free | Recursive least squares on frozen feat. | Perfect invariance, 100%+ gain over replay |
| DOLFIN (Moussadek et al., 15 Oct 2025) | ViT + dual-GPM LoRA, orth. adapters | Gradient-space orthogonality | Best FAA under all Dirichlet splits |
| Fed-A-GEM (Dai et al., 2024) | Buffer gradient projection shared global | Gradient projection wrt buffer | +27% acc. vs. vanilla baselines (CIFAR100) |
| FedBNN (Yao et al., 2024) | Variational Bayes global model | Continual Bayesian inference | State-of-art final forgetting mitigation |
| FedKACE (Tan et al., 27 Jan 2026) | Task-free overlap; kernel buffer select | Adaptive replay; inference switching | Regret bound: global always > local |
References
Please refer to the following arXiv papers for comprehensive protocols, formal analysis, and empirical results:
- (Nguyen et al., 22 May 2025, Palazzo et al., 2024, Yu et al., 13 Aug 2025, Yang et al., 2023, Serra et al., 2024, Peng et al., 19 Mar 2025, Tang et al., 18 May 2025, Qi et al., 2023, Moussadek et al., 15 Oct 2025, Chen et al., 2019, Tan et al., 27 Jan 2026, Dai et al., 2024, Keshri et al., 2024, Lim et al., 6 Aug 2025, Yao et al., 2024).
This literature collectively establishes SFCL as a distinct, multifaceted research area, with stability–plasticity trade-off, communication/memory constraints, and robustness to extreme heterogeneity at its methodological and theoretical core.