You Are Your Own Best Teacher: Achieving Centralized-level Performance in Federated Learning under Heterogeneous and Long-tailed Data

Published 10 Mar 2025 in cs.LG | (2503.06916v1)

Abstract: Data heterogeneity, stemming from local non-IID data and global long-tailed distributions, is a major challenge in federated learning (FL), leading to significant performance gaps compared to centralized learning. Previous research found that poor representations and biased classifiers are the main problems and proposed neural-collapse-inspired synthetic simplex ETF to help representations be closer to neural collapse optima. However, we find that the neural-collapse-inspired methods are not strong enough to reach neural collapse and still have huge gaps to centralized training. In this paper, we rethink this issue from a self-bootstrap perspective and propose FedYoYo (You Are Your Own Best Teacher), introducing Augmented Self-bootstrap Distillation (ASD) to improve representation learning by distilling knowledge between weakly and strongly augmented local samples, without needing extra datasets or models. We further introduce Distribution-aware Logit Adjustment (DLA) to balance the self-bootstrap process and correct biased feature representations. FedYoYo nearly eliminates the performance gap, achieving centralized-level performance even under mixed heterogeneity. It enhances local representation learning, reducing model drift and improving convergence, with feature prototypes closer to neural collapse optimality. Extensive experiments show FedYoYo achieves state-of-the-art results, even surpassing centralized logit adjustment methods by 5.4\% under global long-tailed settings.

Abstract PDF Upgrade to Chat

Summary

The paper presents FedYoYo, which achieves centralized-level performance by using augmented self-bootstrap distillation to align representations in heterogeneous federated learning.
It introduces a distribution-aware logit adjustment method that calibrates classifier outputs to mitigate client drift in non-IID, long-tailed data scenarios.
Extensive experiments on CIFAR-10/100-LT and ImageNet-LT demonstrate superior top-1 accuracy and efficient computational performance compared to state-of-the-art methods.

Achieving Centralized-Level Performance in Federated Learning

Introduction to the Challenges

Federated Learning (FL) is an emerging paradigm that enables collaborative model training across distributed data sources while preserving data privacy. However, one significant bottleneck in FL is handling data heterogeneity, particularly when local data is non-IID (non-identically independently distributed) and the global distribution is long-tailed. This heterogeneity impairs model convergence and generalization, creating a substantial performance gap between federated and centralized learning approaches (Figure 1).

Figure 1: Our method substantially closes the gap between centralized training and federated learning under heterogeneous data. Left: non-IID data with global long-tailed distribution. Right: non-IID data with global balanced distribution.

FedYoYo Methodology

Augmented Self-bootstrap Distillation

FedYoYo introduces Augmented Self-bootstrap Distillation (ASD), derived from self-supervised learning paradigms such as BYOL. This technique employs weakly augmented samples to teach strongly augmented ones, aiding in capturing more robust feature representations. Here, the weak augmentation acts as the self-teacher, and KL divergence is minimized between strong and weak views to enhance representation quality:

$\mathcal{L}_{ASD} = \frac{1}{n_k} \sum_{i=1}^{n_k} {KL} \big( p(\overline{x}_i) \,\big\|\, p(\widetilde{x}_i) \big)$

The importance of ASD lies in effectively utilizing local models as their best teachers, improving alignment across heterogeneous client data.

Distribution-aware Logit Adjustment

To combat biases in feature representation and classifier outputs under heterogeneous distributions, the FedYoYo strategy employs Distribution-aware Logit Adjustment (DLA). DLA introduces a balanced softmax function informed by a mix of local and estimated global distributions to calibrate classifier outputs:

$p(x)=\frac{\pi_{mix}^y\exp\left(f(x,y)/T\right)}{\sum_{y^{\prime}=1}^C\pi_{mix}^{y^{\prime}{\exp\left( f(x,y^{\prime})/T\right)}}$

Incorporating the global distribution in the adjustment helps to address client drift and improve classifier balance.

Experimental Results

FedYoYo's superiority is demonstrated through extensive experiments on datasets like CIFAR-10/100-LT and ImageNet-LT. It consistently outperforms state-of-the-art methods, showing significant improvements in top-1 test accuracy under various conditions (Table 2).

Visualization Insights

Figure 2: Visualization of neural collapse degrees and accuracy for global models on CIFAR-10. Our FedYoYo method reaches better neural collapse conditions and achieves the best performance.

This visualization confirms FedYoYo's ability to achieve superior representation alignment, reducing discrepancies between local and global models and exhibiting enhanced feature separability (Figure 3).

Figure 3: Overview of our proposed FedYoYo framework. The estimated client distributions are aggregated to obtain an approximate global distribution.

Ablation Study

To provide comprehensive insight into FedYoYo's effectiveness, ablation studies highlight the pivotal role of ASD and DLA components (Table 3). Interestingly, the augmentation strategy showed limited impact on overall performance, reinforcing the robustness of the proposed method.

Computational Efficiency

FedYoYo maintains competitive performance with reduced computational overhead, clocking efficient GFLOPs compared to other federated strategies, demonstrating that balancing robust learning with computational efficiency is achievable.

Future Implications

The efficacy of FedYoYo offers promising insights into addressing data heterogeneity in federated settings, suggesting potential pathways for improving FL's applicability in real-world scenarios. The method provides a robust framework for aligning local and global models, potentially influencing advancements in other distributed machine learning contexts.

Conclusion

FedYoYo presents a novel federated learning approach that integrates self-bootstrap distillation and distribution-aware logit adjustment to effectively handle heterogeneous data. Extensive experimentation confirms its capability to close performance gaps with centralized systems and achieve state-of-the-art results across challenging datasets. FedYoYo's strategic design and promising outcomes pave the way for more robust federated solutions addressing intricate data distributions.

Markdown Report Issue