Papers
Topics
Authors
Recent
Search
2000 character limit reached

FedKACE: Streaming Federated Continual Learning

Updated 3 February 2026
  • FedKACE is a federated learning framework for streaming continual learning, designed to handle continual data acquisition with overlapping categories and no task identifiers.
  • It integrates adaptive model switching, gradient-balanced replay, and kernel spectral boundary buffering to efficiently balance new information with past knowledge.
  • Empirical results on benchmarks like CIFAR-100 and ImageNet-100 show that FedKACE reduces regret and improves accuracy compared to baseline methods.

FedKACE is a federated learning (FL) framework designed for streaming federated continual learning (FCL) settings, specifically addressing the challenge of continual data acquisition across clients when category overlap is present and task identifiers are absent. FedKACE introduces a knowledge-aware mechanism that combines adaptive model selection, gradient-balanced replay, and a kernel spectral boundary buffer for robust knowledge retention and low-regret continual adaptation (Tan et al., 27 Jan 2026).

1. Streaming Federated Continual Learning Problem Formulation

FedKACE operates in a multi-client, multi-round FL environment, where KK clients participate in TT rounds. In each round tt, a client kk receives a data batch Dkt={(xi,yi)}i=1nk,t\mathcal{D}_k^t = \{(x_i, y_i)\}_{i=1}^{n_{k,t}}, with labels yiy_i drawn from a time-dependent subset Ckt{1,,Cmax}\mathcal{C}_k^t \subseteq \{1,\ldots, C_\text{max}\}. Across rounds, category sets may overlap (CktCkt\mathcal{C}_k^t \cap \mathcal{C}_k^{t'} \neq \emptyset), but sample-level task identifiers are unavailable. After every round, clients’ models must support inference over all seen categories: Ckt=τ=1tCkτ\mathcal{C}_k^{\le t} = \bigcup_{\tau=1}^t \mathcal{C}_k^\tau.

Each client maintains a local model θkt\theta_k^t (feature extractor ϕ\phi and classifier hh) and seeks to minimize cumulative risk: argminθkt  cCktExpkt(xc)[(h(ϕ(x)),c)]+Rkt(θkt)\arg\min_{\theta_k^t}\; \sum_{c\in \mathcal{C}_k^{\le t}} \mathbb{E}_{x\sim p_k^t(x\mid c)} \bigl[\ell(h(\phi(x)), c)\bigr] + \mathcal{R}_k^t(\theta_k^t) where Rkt\mathcal{R}_k^t is a regularizer, typically enforcing aggregation consistency. This formulation is designed to address catastrophic forgetting and promote adaptation to continually evolving data without explicit task demarcations (Tan et al., 27 Jan 2026).

2. Core Components of FedKACE

FedKACE is characterized by three primary innovations, each serving a distinct role within the federated continual learning process:

a) Adaptive Inference Model Switching

FedKACE allows clients to dynamically choose between local and global models for inference. Initially, clients use their customized local models, which better fit their replay buffers. As the system evolves, clients monitor the “generalization gap” on their replay buffers (difference between global model accuracy and confidence): gapkt=max(0,ACCg,BFtPROBg,BFt)\mathrm{gap}_k^t = \max\left(0,\, \mathrm{ACC}_{g,BF}^t - \mathrm{PROB}_{g,BF}^t\right) A discrete difference Δ(gapkt)\Delta(\mathrm{gap}_k^t) is tracked. Clients switch to using the global model for inference if two consecutive negative gap-changes are observed: tk,switch=min{tΔ(gapkt)<0Δ(gapkt1)<0}t_{k,\mathrm{switch}} = \min\left\{t \mid \Delta(\mathrm{gap}_k^t) < 0 \wedge \Delta(\mathrm{gap}_k^{t-1}) < 0\right\} This realizes a principled transition from personalization (local) to generalization (global), optimizing performance as knowledge consolidation progresses (Tan et al., 27 Jan 2026).

b) Adaptive Gradient-Balanced Replay Scheme

To balance learning from new and old data in the presence of category overlap, FedKACE combines new-task and replay loss per local epoch jj: Ltotalt(θkt,j)=Ltaskt(θkt,j)+λkt,jLrept(θkt,j)L_\text{total}^t(\theta_k^{t,j}) = L_\text{task}^t(\theta_k^{t,j}) + \lambda_k^{t,j} L_\text{rep}^t(\theta_k^{t,j}) with update for the replay-balancing weight: λkt,j+1=hLrept(θkt,j)22hLtaskt(θkt,j)22\lambda_k^{t,j+1} = \frac{\|\nabla_{h} L_\text{rep}^t(\theta_k^{t,j})\|_2^2}{\|\nabla_{h} L_\text{task}^t(\theta_k^{t,j})\|_2^2} where LtasktL_\text{task}^t and LreptL_\text{rep}^t are cross-entropy losses on current and replay data, respectively. The per-epoch weight λ\lambda adapts automatically to the gradient magnitudes, driving the optimization towards a min-max saddle point that balances plasticity and stability (Tan et al., 27 Jan 2026).

c) Kernel Spectral Boundary Buffer Maintenance

Given a fixed memory budget MM, FedKACE’s buffer selection combines feature-space dispersion and decision-boundary relevance. Using normalized logit vectors g^(x)\hat g(x) and a Gaussian kernel

K(x,xi)=exp(βg^(x)g^(xi)2)K(x, x_i) = \exp\bigl(-\beta\|\hat g(x)-\hat g(x_i)\|^2\bigr)

the buffer selection involves:

  • Spectral Diversity (DS): Measures minimum distance in logit space to maintenance diverse samples.
  • Category-wise Information-Diversity Value (IDV): Penalizes high-confidence, overrepresented samples.
  • Consistency-Diversity Value (CDV): Prioritizes points that cause shifts in category probabilities.

The buffer update follows a two-stage procedure: (1) select top-$2Q$ samples by IDV, then (2) select top-QQ from these by CDV (per category). For new classes, samples are chosen by IDV-weighted sampling. This results in lower regret and improved retention versus random sampling (Tan et al., 27 Jan 2026).

3. Federated Optimization and Global Aggregation

At the global level, FedKACE operates in rounds:

  1. The server sends the current global model θgt1\theta_g^{t-1} to all clients.
  2. Each client trains locally as described above, updating θkt,J\theta_k^{t,J} and their buffer.
  3. Clients upload updated model parameters to the server.
  4. The server aggregates: θgt=1Kkθkt,J\theta_g^t = \frac{1}{K} \sum_k \theta_k^{t,J}
  5. Each client decides—based on the adaptive switching rule—whether to use its local or the new global model for inference in the next round.

This sequencing, coupled with the three core mechanisms, ensures distributed clients efficiently reconcile new information with past knowledge while mitigating communication overhead and catastrophic forgetting (Tan et al., 27 Jan 2026).

4. Theoretical Guarantees

FedKACE’s methodology is supported by formal analysis:

  • Local Saddle-Point Convergence: Under standard smoothness and step-size assumptions, the adaptive gradient-replay scheme converges to a saddle point (θkt,,λkt,)(\theta_k^{t,*}, \lambda_k^{t,*}) of the loss surface.
  • Kernel Spectral Buffer Regret Bound: Selecting buffer samples via spectral criteria guarantees a regret bound of the form

Regretk(t)CκO(Ckt/M)+O(1/tmin(α,1))\mathrm{Regret}_k(t) \leq C_\kappa O \left( \sqrt{ |\mathcal{C}_k^{\leq t}|/M } \right) + O(1/t^{\min(\alpha, 1)})

with Cκ<1C_\kappa < 1, strictly improving over random replay.

  • Global Model Regret: The aggregated model achieves tighter average regret,

E[Regretkglobal(t)]O(1/J)+CκO(CglobaltKcmintM)+O(t1α)\mathbb{E}[\mathrm{Regret}^{\text{global}}_k(t)] \leq O(1/J) + C_\kappa O\left( \sqrt{ \frac{ |\mathcal{C}_\text{global}^{\leq t}| }{ |\mathcal{K}_{c_\text{min}^t}|\,M } } \right) + O(t^{1-\alpha})

which guarantees improved long-term performance over either local-only or unbuffered baselines, at least for moderate CmaxC_\text{max} and feasible MM (Tan et al., 27 Jan 2026).

5. Experimental Evaluation

FedKACE was validated on the CIFAR-100 and ImageNet-100 datasets (Cmax=100C_\text{max}=100) with K=10K=10 clients and T=100T=100 rounds. Data were distributed with classes per round and varying class overlaps O{5,4,2,0}O \in \{5, 4, 2, 0\}. Key metrics included Average Accuracy (AA) and Average Regret (AR) relative to a client-centralized upper bound.

Summary of results:

  • CIFAR-100, no overlap (O=0): FedKACE AA ≈ 20.96%, AR ≈ 14.23%; best baseline (FedCBDR) AA ≈ 19.09%, AR ≈ 16.10%.
  • Static setting (O=5): FedKACE AA ≈ 26.59% vs. FedCBDR 21.91% (+4.68 pts); AR ≈ 24.95% vs. 29.62% (−4.67 pts).
  • Ablation: Removing the dual-round switching rule or using nonadaptive λ\lambda/random buffer reduces accuracy by 1–2 pts, confirming each component’s criticality.
  • Scalability: Increasing buffer MM produces more significant accuracy gains than doubling the number of clients, consistent with the regret theory (Tan et al., 27 Jan 2026).

6. Implementation Considerations and Limitations

The kernel spectral buffer selection imposes an O(M(M+D))O(M(M+D)) cost per client per round, with DD the feature dimension. This can be reduced using random Fourier features or landmark selection, trading some theoretical tightness for computational efficiency. FedKACE requires no tuning of λ\lambda or model-switching thresholds; essential hyperparameters include buffer size MM, number of local epochs JJ, and standard optimizer settings. A key limitation is that extremely large CmaxC_\text{max} or streaming duration TT may outpace buffer representativeness, necessitating larger or more sophisticated replay mechanisms (Tan et al., 27 Jan 2026).

FedKACE extends the family of knowledge-driven FL methods by addressing continual, task-agnostic learning with streaming, overlapping categories. In contrast to batch FCL or methods relying on static task boundaries, its streaming buffer, replay, and adaptive model switching enable robust generalization and retention. Empirical and theoretical analyses show that FedKACE outperforms nonadaptive replay, fixed-buffer, and task-identifier-reliant approaches for streaming federated scenarios. The approach is closely related to, but distinct from, server-driven knowledge cache methods (e.g., FedCache (Wu et al., 2023)): FedKACE focuses on evolving local replay and model generalization in the continual learning regime, whereas FedCache emphasizes sample-grained, hash-based client knowledge routing for personalized edge intelligence (Wu et al., 2023, Tan et al., 27 Jan 2026).

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to FedKACE.