FedKACE: Streaming Federated Continual Learning
- FedKACE is a federated learning framework for streaming continual learning, designed to handle continual data acquisition with overlapping categories and no task identifiers.
- It integrates adaptive model switching, gradient-balanced replay, and kernel spectral boundary buffering to efficiently balance new information with past knowledge.
- Empirical results on benchmarks like CIFAR-100 and ImageNet-100 show that FedKACE reduces regret and improves accuracy compared to baseline methods.
FedKACE is a federated learning (FL) framework designed for streaming federated continual learning (FCL) settings, specifically addressing the challenge of continual data acquisition across clients when category overlap is present and task identifiers are absent. FedKACE introduces a knowledge-aware mechanism that combines adaptive model selection, gradient-balanced replay, and a kernel spectral boundary buffer for robust knowledge retention and low-regret continual adaptation (Tan et al., 27 Jan 2026).
1. Streaming Federated Continual Learning Problem Formulation
FedKACE operates in a multi-client, multi-round FL environment, where clients participate in rounds. In each round , a client receives a data batch , with labels drawn from a time-dependent subset . Across rounds, category sets may overlap (), but sample-level task identifiers are unavailable. After every round, clients’ models must support inference over all seen categories: .
Each client maintains a local model (feature extractor and classifier ) and seeks to minimize cumulative risk: where is a regularizer, typically enforcing aggregation consistency. This formulation is designed to address catastrophic forgetting and promote adaptation to continually evolving data without explicit task demarcations (Tan et al., 27 Jan 2026).
2. Core Components of FedKACE
FedKACE is characterized by three primary innovations, each serving a distinct role within the federated continual learning process:
a) Adaptive Inference Model Switching
FedKACE allows clients to dynamically choose between local and global models for inference. Initially, clients use their customized local models, which better fit their replay buffers. As the system evolves, clients monitor the “generalization gap” on their replay buffers (difference between global model accuracy and confidence): A discrete difference is tracked. Clients switch to using the global model for inference if two consecutive negative gap-changes are observed: This realizes a principled transition from personalization (local) to generalization (global), optimizing performance as knowledge consolidation progresses (Tan et al., 27 Jan 2026).
b) Adaptive Gradient-Balanced Replay Scheme
To balance learning from new and old data in the presence of category overlap, FedKACE combines new-task and replay loss per local epoch : with update for the replay-balancing weight: where and are cross-entropy losses on current and replay data, respectively. The per-epoch weight adapts automatically to the gradient magnitudes, driving the optimization towards a min-max saddle point that balances plasticity and stability (Tan et al., 27 Jan 2026).
c) Kernel Spectral Boundary Buffer Maintenance
Given a fixed memory budget , FedKACE’s buffer selection combines feature-space dispersion and decision-boundary relevance. Using normalized logit vectors and a Gaussian kernel
the buffer selection involves:
- Spectral Diversity (DS): Measures minimum distance in logit space to maintenance diverse samples.
- Category-wise Information-Diversity Value (IDV): Penalizes high-confidence, overrepresented samples.
- Consistency-Diversity Value (CDV): Prioritizes points that cause shifts in category probabilities.
The buffer update follows a two-stage procedure: (1) select top-$2Q$ samples by IDV, then (2) select top- from these by CDV (per category). For new classes, samples are chosen by IDV-weighted sampling. This results in lower regret and improved retention versus random sampling (Tan et al., 27 Jan 2026).
3. Federated Optimization and Global Aggregation
At the global level, FedKACE operates in rounds:
- The server sends the current global model to all clients.
- Each client trains locally as described above, updating and their buffer.
- Clients upload updated model parameters to the server.
- The server aggregates:
- Each client decides—based on the adaptive switching rule—whether to use its local or the new global model for inference in the next round.
This sequencing, coupled with the three core mechanisms, ensures distributed clients efficiently reconcile new information with past knowledge while mitigating communication overhead and catastrophic forgetting (Tan et al., 27 Jan 2026).
4. Theoretical Guarantees
FedKACE’s methodology is supported by formal analysis:
- Local Saddle-Point Convergence: Under standard smoothness and step-size assumptions, the adaptive gradient-replay scheme converges to a saddle point of the loss surface.
- Kernel Spectral Buffer Regret Bound: Selecting buffer samples via spectral criteria guarantees a regret bound of the form
with , strictly improving over random replay.
- Global Model Regret: The aggregated model achieves tighter average regret,
which guarantees improved long-term performance over either local-only or unbuffered baselines, at least for moderate and feasible (Tan et al., 27 Jan 2026).
5. Experimental Evaluation
FedKACE was validated on the CIFAR-100 and ImageNet-100 datasets () with clients and rounds. Data were distributed with classes per round and varying class overlaps . Key metrics included Average Accuracy (AA) and Average Regret (AR) relative to a client-centralized upper bound.
Summary of results:
- CIFAR-100, no overlap (O=0): FedKACE AA ≈ 20.96%, AR ≈ 14.23%; best baseline (FedCBDR) AA ≈ 19.09%, AR ≈ 16.10%.
- Static setting (O=5): FedKACE AA ≈ 26.59% vs. FedCBDR 21.91% (+4.68 pts); AR ≈ 24.95% vs. 29.62% (−4.67 pts).
- Ablation: Removing the dual-round switching rule or using nonadaptive /random buffer reduces accuracy by 1–2 pts, confirming each component’s criticality.
- Scalability: Increasing buffer produces more significant accuracy gains than doubling the number of clients, consistent with the regret theory (Tan et al., 27 Jan 2026).
6. Implementation Considerations and Limitations
The kernel spectral buffer selection imposes an cost per client per round, with the feature dimension. This can be reduced using random Fourier features or landmark selection, trading some theoretical tightness for computational efficiency. FedKACE requires no tuning of or model-switching thresholds; essential hyperparameters include buffer size , number of local epochs , and standard optimizer settings. A key limitation is that extremely large or streaming duration may outpace buffer representativeness, necessitating larger or more sophisticated replay mechanisms (Tan et al., 27 Jan 2026).
7. Context, Related Work, and Positioning
FedKACE extends the family of knowledge-driven FL methods by addressing continual, task-agnostic learning with streaming, overlapping categories. In contrast to batch FCL or methods relying on static task boundaries, its streaming buffer, replay, and adaptive model switching enable robust generalization and retention. Empirical and theoretical analyses show that FedKACE outperforms nonadaptive replay, fixed-buffer, and task-identifier-reliant approaches for streaming federated scenarios. The approach is closely related to, but distinct from, server-driven knowledge cache methods (e.g., FedCache (Wu et al., 2023)): FedKACE focuses on evolving local replay and model generalization in the continual learning regime, whereas FedCache emphasizes sample-grained, hash-based client knowledge routing for personalized edge intelligence (Wu et al., 2023, Tan et al., 27 Jan 2026).