FedKACE: Streaming Federated Continual Learning

Updated 3 February 2026

FedKACE is a federated learning framework for streaming continual learning, designed to handle continual data acquisition with overlapping categories and no task identifiers.
It integrates adaptive model switching, gradient-balanced replay, and kernel spectral boundary buffering to efficiently balance new information with past knowledge.
Empirical results on benchmarks like CIFAR-100 and ImageNet-100 show that FedKACE reduces regret and improves accuracy compared to baseline methods.

FedKACE is a federated learning (FL) framework designed for streaming federated continual learning (FCL) settings, specifically addressing the challenge of continual data acquisition across clients when category overlap is present and task identifiers are absent. FedKACE introduces a knowledge-aware mechanism that combines adaptive model selection, gradient-balanced replay, and a kernel spectral boundary buffer for robust knowledge retention and low-regret continual adaptation (Tan et al., 27 Jan 2026).

1. Streaming Federated Continual Learning Problem Formulation

FedKACE operates in a multi-client, multi-round FL environment, where $K$ clients participate in $T$ rounds. In each round $t$ , a client $k$ receives a data batch $\mathcal{D}_k^t = \{(x_i, y_i)\}_{i=1}^{n_{k,t}}$ , with labels $y_i$ drawn from a time-dependent subset $\mathcal{C}_k^t \subseteq \{1,\ldots, C_\text{max}\}$ . Across rounds, category sets may overlap ( $\mathcal{C}_k^t \cap \mathcal{C}_k^{t'} \neq \emptyset$ ), but sample-level task identifiers are unavailable. After every round, clients’ models must support inference over all seen categories: $\mathcal{C}_k^{\le t} = \bigcup_{\tau=1}^t \mathcal{C}_k^\tau$ .

Each client maintains a local model $\theta_k^t$ (feature extractor $\phi$ and classifier $h$ ) and seeks to minimize cumulative risk: $\arg\min_{\theta_k^t}\; \sum_{c\in \mathcal{C}_k^{\le t}} \mathbb{E}_{x\sim p_k^t(x\mid c)} \bigl[\ell(h(\phi(x)), c)\bigr] + \mathcal{R}_k^t(\theta_k^t)$ where $\mathcal{R}_k^t$ is a regularizer, typically enforcing aggregation consistency. This formulation is designed to address catastrophic forgetting and promote adaptation to continually evolving data without explicit task demarcations (Tan et al., 27 Jan 2026).

2. Core Components of FedKACE

FedKACE is characterized by three primary innovations, each serving a distinct role within the federated continual learning process:

a) Adaptive Inference Model Switching

FedKACE allows clients to dynamically choose between local and global models for inference. Initially, clients use their customized local models, which better fit their replay buffers. As the system evolves, clients monitor the “generalization gap” on their replay buffers (difference between global model accuracy and confidence): $\mathrm{gap}_k^t = \max\left(0,\, \mathrm{ACC}_{g,BF}^t - \mathrm{PROB}_{g,BF}^t\right)$ A discrete difference $\Delta(\mathrm{gap}_k^t)$ is tracked. Clients switch to using the global model for inference if two consecutive negative gap-changes are observed: $t_{k,\mathrm{switch}} = \min\left\{t \mid \Delta(\mathrm{gap}_k^t) < 0 \wedge \Delta(\mathrm{gap}_k^{t-1}) < 0\right\}$ This realizes a principled transition from personalization (local) to generalization (global), optimizing performance as knowledge consolidation progresses (Tan et al., 27 Jan 2026).

b) Adaptive Gradient-Balanced Replay Scheme

To balance learning from new and old data in the presence of category overlap, FedKACE combines new-task and replay loss per local epoch $j$ : $L_\text{total}^t(\theta_k^{t,j}) = L_\text{task}^t(\theta_k^{t,j}) + \lambda_k^{t,j} L_\text{rep}^t(\theta_k^{t,j})$ with update for the replay-balancing weight: $\lambda_k^{t,j+1} = \frac{\|\nabla_{h} L_\text{rep}^t(\theta_k^{t,j})\|_2^2}{\|\nabla_{h} L_\text{task}^t(\theta_k^{t,j})\|_2^2}$ where $L_\text{task}^t$ and $L_\text{rep}^t$ are cross-entropy losses on current and replay data, respectively. The per-epoch weight $\lambda$ adapts automatically to the gradient magnitudes, driving the optimization towards a min-max saddle point that balances plasticity and stability (Tan et al., 27 Jan 2026).

c) Kernel Spectral Boundary Buffer Maintenance

Given a fixed memory budget $M$ , FedKACE’s buffer selection combines feature-space dispersion and decision-boundary relevance. Using normalized logit vectors $\hat g(x)$ and a Gaussian kernel

$K(x, x_i) = \exp\bigl(-\beta\|\hat g(x)-\hat g(x_i)\|^2\bigr)$

the buffer selection involves:

Spectral Diversity (DS): Measures minimum distance in logit space to maintenance diverse samples.
Category-wise Information-Diversity Value (IDV): Penalizes high-confidence, overrepresented samples.
Consistency-Diversity Value (CDV): Prioritizes points that cause shifts in category probabilities.

The buffer update follows a two-stage procedure: (1) select top-$2Q$ samples by IDV, then (2) select top- $Q$ from these by CDV (per category). For new classes, samples are chosen by IDV-weighted sampling. This results in lower regret and improved retention versus random sampling (Tan et al., 27 Jan 2026).

3. Federated Optimization and Global Aggregation

At the global level, FedKACE operates in rounds:

The server sends the current global model $\theta_g^{t-1}$ to all clients.
Each client trains locally as described above, updating $\theta_k^{t,J}$ and their buffer.
Clients upload updated model parameters to the server.
The server aggregates: $\theta_g^t = \frac{1}{K} \sum_k \theta_k^{t,J}$
Each client decides—based on the adaptive switching rule—whether to use its local or the new global model for inference in the next round.

This sequencing, coupled with the three core mechanisms, ensures distributed clients efficiently reconcile new information with past knowledge while mitigating communication overhead and catastrophic forgetting (Tan et al., 27 Jan 2026).

4. Theoretical Guarantees

FedKACE’s methodology is supported by formal analysis:

Local Saddle-Point Convergence: Under standard smoothness and step-size assumptions, the adaptive gradient-replay scheme converges to a saddle point $(\theta_k^{t,*}, \lambda_k^{t,*})$ of the loss surface.
Kernel Spectral Buffer Regret Bound: Selecting buffer samples via spectral criteria guarantees a regret bound of the form

$\mathrm{Regret}_k(t) \leq C_\kappa O \left( \sqrt{ |\mathcal{C}_k^{\leq t}|/M } \right) + O(1/t^{\min(\alpha, 1)})$

with $C_\kappa < 1$ , strictly improving over random replay.

Global Model Regret: The aggregated model achieves tighter average regret,

$\mathbb{E}[\mathrm{Regret}^{\text{global}}_k(t)] \leq O(1/J) + C_\kappa O\left( \sqrt{ \frac{ |\mathcal{C}_\text{global}^{\leq t}| }{ |\mathcal{K}_{c_\text{min}^t}|\,M } } \right) + O(t^{1-\alpha})$

which guarantees improved long-term performance over either local-only or unbuffered baselines, at least for moderate $C_\text{max}$ and feasible $M$ (Tan et al., 27 Jan 2026).

5. Experimental Evaluation

FedKACE was validated on the CIFAR-100 and ImageNet-100 datasets ( $C_\text{max}=100$ ) with $K=10$ clients and $T=100$ rounds. Data were distributed with classes per round and varying class overlaps $O \in \{5, 4, 2, 0\}$ . Key metrics included Average Accuracy (AA) and Average Regret (AR) relative to a client-centralized upper bound.

Summary of results:

CIFAR-100, no overlap (O=0): FedKACE AA ≈ 20.96%, AR ≈ 14.23%; best baseline (FedCBDR) AA ≈ 19.09%, AR ≈ 16.10%.
Static setting (O=5): FedKACE AA ≈ 26.59% vs. FedCBDR 21.91% (+4.68 pts); AR ≈ 24.95% vs. 29.62% (−4.67 pts).
Ablation: Removing the dual-round switching rule or using nonadaptive $\lambda$ /random buffer reduces accuracy by 1–2 pts, confirming each component’s criticality.
Scalability: Increasing buffer $M$ produces more significant accuracy gains than doubling the number of clients, consistent with the regret theory (Tan et al., 27 Jan 2026).

6. Implementation Considerations and Limitations

The kernel spectral buffer selection imposes an $O(M(M+D))$ cost per client per round, with $D$ the feature dimension. This can be reduced using random Fourier features or landmark selection, trading some theoretical tightness for computational efficiency. FedKACE requires no tuning of $\lambda$ or model-switching thresholds; essential hyperparameters include buffer size $M$ , number of local epochs $J$ , and standard optimizer settings. A key limitation is that extremely large $C_\text{max}$ or streaming duration $T$ may outpace buffer representativeness, necessitating larger or more sophisticated replay mechanisms (Tan et al., 27 Jan 2026).

FedKACE extends the family of knowledge-driven FL methods by addressing continual, task-agnostic learning with streaming, overlapping categories. In contrast to batch FCL or methods relying on static task boundaries, its streaming buffer, replay, and adaptive model switching enable robust generalization and retention. Empirical and theoretical analyses show that FedKACE outperforms nonadaptive replay, fixed-buffer, and task-identifier-reliant approaches for streaming federated scenarios. The approach is closely related to, but distinct from, server-driven knowledge cache methods (e.g., FedCache (Wu et al., 2023)): FedKACE focuses on evolving local replay and model generalization in the continual learning regime, whereas FedCache emphasizes sample-grained, hash-based client knowledge routing for personalized edge intelligence (Wu et al., 2023, Tan et al., 27 Jan 2026).

Markdown Report Issue Upgrade to Chat

References (2)

Knowledge-Aware Evolution for Streaming Federated Continual Learning with Category Overlap and without Task Identifiers (2026)

FedCache: A Knowledge Cache-driven Federated Learning Architecture for Personalized Edge Intelligence (2023)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to FedKACE.

FedKACE: Streaming Federated Continual Learning

1. Streaming Federated Continual Learning Problem Formulation

2. Core Components of FedKACE

a) Adaptive Inference Model Switching

b) Adaptive Gradient-Balanced Replay Scheme

c) Kernel Spectral Boundary Buffer Maintenance

3. Federated Optimization and Global Aggregation

4. Theoretical Guarantees

5. Experimental Evaluation

6. Implementation Considerations and Limitations

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Don't miss out on important new AI/ML research

FedKACE: Streaming Federated Continual Learning

1. Streaming Federated Continual Learning Problem Formulation

2. Core Components of FedKACE

a) Adaptive Inference Model Switching

b) Adaptive Gradient-Balanced Replay Scheme

c) Kernel Spectral Boundary Buffer Maintenance

3. Federated Optimization and Global Aggregation

4. Theoretical Guarantees

5. Experimental Evaluation

6. Implementation Considerations and Limitations

7. Context, Related Work, and Positioning

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Related Topics

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research