Papers
Topics
Authors
Recent
Search
2000 character limit reached

Streaming Federated Continual Learning

Updated 3 February 2026
  • Streaming Federated Continual Learning is a framework that combines federated and continual learning to collaboratively train models on evolving, non-IID data streams.
  • It addresses challenges such as spatio-temporal heterogeneity and catastrophic forgetting using methods like replay-based buffers, gradient matching, and analytic solutions.
  • SFCL employs synchronous and asynchronous protocols with adaptive buffering to optimize the stability-plasticity trade-off while reducing communication and computation overhead.

Streaming Federated Continual Learning (SFCL) is a paradigm that unifies the challenges of federated learning (FL) and continual learning (CL) under the constraint of data streams: multiple distributed clients receive non-stationary, possibly non-IID data, in a temporally evolving manner, and must collaboratively learn a global model that generalizes across all seen tasks without catastrophic forgetting. SFCL explicitly models spatio-temporal heterogeneity—statistical variation across clients (spatial) and tasks (temporal)—as data and label distributions shift arbitrarily over time and across the federation. This setting subsumes and extends batch-based FCL, introducing significant new algorithmic and theoretical challenges.

1. Formal Setting and Problem Statement

The SFCL setting assumes a federation of UU clients. Each client u{1,,U}u\in\{1,\dots,U\} observes its own private online sequence of datasets {Dut}t=1Tu\{\mathcal{D}_u^t\}_{t=1}^{T_u}, where each Dut\mathcal{D}_u^t may cover novel or overlapping classes, may not be temporally or semantically ordered, and, under the Limitless Task Pool (LTP) assumption, has no guaranteed relation to other clients' streams or temporal orderings (Nguyen et al., 22 May 2025). In each federated communication round, clients can access only the most recent batch of stream data, with past samples discarded or, at best, sparsely buffered.

The global objective is to minimize the cumulative risk for all seen tasks and clients, under strict privacy and single-pass storage constraints: minθT  1Uu=1Ui=1TL(θT;Dui)\min_{\theta^T} \;\frac{1}{U}\sum_{u=1}^U\sum_{i=1}^T\mathcal{L}(\theta^T;\mathcal{D}_u^i) where direct access to all historical data is infeasible due to privacy, storage, and streaming protocol.

Key elements:

  • Spatial heterogeneity: each client’s data distribution pu(x,y)p_u(x,y) can be arbitrary and often highly non-IID.
  • Temporal heterogeneity: task sequence and class distributions for each client are non-stationary.
  • Catastrophic forgetting: compounded across temporal (per-client) and spatial (global) axes (Yang et al., 2023).

2. Principal Algorithmic Frameworks

Synchronous and Asynchronous Protocols

Most SFCL frameworks instantiate either synchronous or asynchronous protocols for streaming tasks (Yang et al., 2023):

  • Synchronous SFCL: all clients jointly increment a common task index, e.g., each global communication round corresponds to completion of a new task on all clients.
  • Asynchronous SFCL: clients may proceed through their local task streams at independent rates, with the central server aggregating updates opportunistically; suitable for dynamic federations and partial client participation.

Core Methodological Classes

Extant SFCL methods fit into several broad algorithmic categories:

Method Class Mechanism Typical Examples
Replay-based Buffer or generator for old examples/pseudo-examples Experience Replay (Serra et al., 2024), Generative Replay (Qi et al., 2023), Buffer Gradient Projection (Dai et al., 2024), C-FLAG (Keshri et al., 2024)
Gradient Matching Align gradients across tasks/clients STAMP (Nguyen et al., 22 May 2025), Fed-A-GEM (Dai et al., 2024)
Parameter/Adapter Decompose or freeze subnetworks, use adapters LoRA/ViT Adapters (DOLFIN (Moussadek et al., 15 Oct 2025)), FedWeIT, parameter isolation
Regularization Knowledge distillation, EWC, SI, or KL penalties FedBNN (Yao et al., 2024), SMCF (Fed-LSCL (Yu et al., 13 Aug 2025))
Analytic/Closed-form Gradient-free, recursive least-squares solutions AFCL (Tang et al., 18 May 2025)
Knowledge Fusion & Distillation Aggregate prototypes, heads, or adapter weights OBO Distillation (Fed-LSCL (Yu et al., 13 Aug 2025)), Prototype fusion (Yang et al., 2023)
Model-level Replay Send models for rehearsal on historic data FedRewind (Palazzo et al., 2024)

Extensions include one-shot distillation for continual segmentation (Peng et al., 19 Mar 2025), streaming federated recommendation (Lim et al., 6 Aug 2025), and Bayesian continual learning (Yao et al., 2024).

3. Core Challenges and Catastrophic Forgetting

Spatial-temporal catastrophic forgetting is a central obstacle in SFCL (Yang et al., 2023). This encompasses both:

  • Temporal catastrophic forgetting (TemporalCF): knowledge erasure within a client as it processes new tasks, unable to revisit old data.
  • Spatial catastrophic forgetting (SpatialCF): loss of client-specific specialization or global generalization when aggregating heterogeneous updates, as global model quality on past tasks decays due to inter-client conflicts.

Evaluation criteria typically include:

  • Accuracy/Knowledge Retention: averaged over all tasks/classes at each round.
  • Average Forgetting (AF): mean drop in task accuracy from acquisition to the end of stream.
  • Communication and Computation Costs: overhead induced per client and per communication.

Emphasis is placed on designs that optimize the stability–plasticity trade-off: maximal preservation of prior knowledge (stability) with strong adaptation to non-stationary data (plasticity).

4. Representative Methods and Advances

Gradient Matching and Coreset Prototypes

STAMP employs spatio-temporal gradient matching: at the client, it aggregates and aligns gradients from current and historical data (or their prototypical summaries), while at the server, it solves an optimization for the invariant direction among aggregated client updates. Prototypical coresets are maintained per class as stable, memory-efficient surrogates for prior knowledge, and are selected via a convex program to preserve class means (Nguyen et al., 22 May 2025). This removes dependence on generative models and reduces both communication and storage requirements.

STAMP achieves superior average accuracy and lower forgetting than both classic federated and generative replay FCL baselines, especially under strong non-IID (Nguyen et al., 22 May 2025).

Replay, Buffer, and Gradient Constraints

Buffer-based methods maintain explicit, class-balanced, or uncertainty-selected buffers (e.g., using Bregman Information) (Serra et al., 2024), or use projection constraints (Fed-A-GEM) to ensure updates do not harm historical performance (Dai et al., 2024). C-FLAG (Keshri et al., 2024) combines incremental gradient aggregation with fixed-size episodic memory to optimize stability/plasticity with O(1/T)O(1/\sqrt{T}) non-convex convergence, representing a theoretically grounded buffer-based replay mechanism.

Large–Small Model Collaboration

Fed-LSCL proposes a composite architecture: each client pairs a frozen Foundation Model with a mutable, lightweight small model that generates LoRA-style adapters for personalized continual adaptation. Continual knowledge is preserved with feature-level consistency loss and personalized head distillation. The server aggregates only the small adapter generators, enabling robust sharing across heterogeneous client architectures (Yu et al., 13 Aug 2025).

Closed-form and Analytic Solutions

AFCL bypasses the sensitivity of gradient-based approaches by employing feature extraction with a frozen backbone and recursive closed-form least-squares classifiers, updating global parameters in a single communication round. The method ensures spatio-temporal and order invariance, exactly matching joint centralized learning regardless of data arrangement (Tang et al., 18 May 2025).

Bayesian Streaming

FedBNN uses variational Bayesian inference with global mean-field priors/posteriors, factorizing all local updates into Gaussian likelihoods and integrating information from all present and historic distributions. This method provides formally motivated mitigation of forgetting without explicit task boundary identification and excels in both class-incremental and domain-incremental regimes (Yao et al., 2024).

Efficient Distillation and Asynchronicity

One-shot distillation approaches, such as MMDS for continual segmentation (Peng et al., 19 Mar 2025), aggregate models from streaming clients/distinct tasks using public unlabelled data and pseudo-label aggregation, performing knowledge fusion via distillation only once per streaming stage.

Asynchronous online FL frameworks permit client updates to synchronize at variable frequencies, leveraging continual-proximal regularization and attention-style central feature re-weighting for robust convergence in the presence of device/communication heterogeneity and lag (Chen et al., 2019).

5. Evaluation, Metrics, and Practical Benchmarks

Common experimental protocols reflect real-world streaming scenarios:

Dimension Options/Settings
Task Sequencing Sequential (IID/non-IID), overlapping, class-incremental, orderless (LTP)
Datasets CIFAR-10, CIFAR-100, ImageNet(-R, -A, 1K), Tiny-ImageNet, EMNIST, medical images
Client Distribution Dirichlet(α\alpha) splits, disjoint class labels, sample imbalance
Communication Frequent (every few mini-batches), infrequent (tasks/blocks), asynchronous
Privacy No sharing of raw data; only partial, sometimes compressed, model/parameter exchange

Metrics include:

  • Per-task and average accuracy
  • Average forgetting (AF)
  • Knowledge retention (temporal KRtKR_t, spatial KRsKR_s) (Yang et al., 2023)
  • Communication and memory footprint
  • Convergence rate and latency

6. Outstanding Issues and Future Directions

Notable open problems and frontiers include:

  • Unified streaming benchmarks: There is a scarcity of standard, diverse, real-world SFCL testbeds reflecting true non-IID, streaming, partial participation, and dynamic task evolution.
  • Task-free, identifierless streaming: Methods such as FedKACE support category overlap and absent task identifiers, employing adaptive model switching, kernel-spectral buffer maintenance, and gradient-balanced replay. These approaches deliver regret bounds showing federated aggregation always improves over local learning as coverage grows (Tan et al., 27 Jan 2026).
  • Communication/Computation Efficiency: Single-round distillation (Peng et al., 19 Mar 2025), lightweight adapter exchange (Moussadek et al., 15 Oct 2025), and closed-form aggregation (Tang et al., 18 May 2025) present promising directions for reducing practical barriers to deployment.
  • Privacy and security: Deepening integration of communication-efficient secure aggregation, differential privacy, and formal privacy-utility trade-offs.
  • Modality and Application Expansion: Extending to text, recommendation (Lim et al., 6 Aug 2025), segmentation, detection, and multimodal streaming FCL.
  • Theoretical understanding: While select methods have provable invariance/convergence (e.g., AFCL (Tang et al., 18 May 2025), C-FLAG (Keshri et al., 2024)), uniform theoretical guarantees under asynchronous, non-identifiable, and high-dimensional streaming are still rare.
  • Adaptive buffering and continual hyperparameter tuning: Effective single-pass, bufferless, or generative continual replay for arbitrarily shaped data/label distributions remains an important area.

7. Summary Table: Selected Streaming FCL Methods

Method / Paper Core Idea Forgetting Mitigation Approach Notable Results / Remarks
STAMP (Nguyen et al., 22 May 2025) Spatio-Temporal GM + Prototypical Coreset Gradient matching via prototypes Best AF vs. system cost, non-IID robust
FedRewind (Palazzo et al., 2024) Decentralized model-level rehearsal Periodic "rewind" to origin client ~3–4pp accuracy gain over baselines
Fed-LSCL (Yu et al., 13 Aug 2025) Large–small model adapters, OBO Distill. Adapter-level feature/weight distill. 97% acc. on CIFAR-100 (vs. 80% best prev)
AFCL (Tang et al., 18 May 2025) Analytic, closed-form, gradient-free Recursive least squares on frozen feat. Perfect invariance, 100%+ gain over replay
DOLFIN (Moussadek et al., 15 Oct 2025) ViT + dual-GPM LoRA, orth. adapters Gradient-space orthogonality Best FAA under all Dirichlet splits
Fed-A-GEM (Dai et al., 2024) Buffer gradient projection shared global Gradient projection wrt buffer +27% acc. vs. vanilla baselines (CIFAR100)
FedBNN (Yao et al., 2024) Variational Bayes global model Continual Bayesian inference State-of-art final forgetting mitigation
FedKACE (Tan et al., 27 Jan 2026) Task-free overlap; kernel buffer select Adaptive replay; inference switching Regret bound: global always > local

References

Please refer to the following arXiv papers for comprehensive protocols, formal analysis, and empirical results:

This literature collectively establishes SFCL as a distinct, multifaceted research area, with stability–plasticity trade-off, communication/memory constraints, and robustness to extreme heterogeneity at its methodological and theoretical core.

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Streaming Federated Continual Learning.