- The paper introduces FedGM, a novel framework that generalizes momentum schemes to address client heterogeneity in federated learning.
- It employs a dynamic 'constant and drop' hyperparameter scheduler to accelerate convergence under full and partial client participation.
- Experiments on ResNet and VGG using CIFAR-10 confirm that FedGM outperforms traditional FedAvg in convergence speed and stability.
On the Role of Server Momentum in Federated Learning
Federated Learning (FL) has emerged as a robust approach for training models across distributed devices without moving the data to a central server. However, traditional Federated Averaging (FedAvg) schemes face significant challenges, particularly when there is a high degree of heterogeneity among client devices. This paper addresses these challenges by exploring the application of server momentum within FL.
Motivation and Challenges in Federated Learning
FedAvg, a widely adopted algorithm in FL, executes multiple epochs of local Stochastic Gradient Descent (SGD) on client devices, followed by aggregating the models on the server. Despite its empirical success, FedAvg suffers from slow and unstable convergence when confronted with client heterogeneity—a condition in which client models diverge from the global objective due to differences in local data distributions.
The paper identifies three primary challenges in existing research on server momentum within FL:
- Limited Exploration of Momentum Schemes: Existing work largely focuses on stochastic heavy ball momentum (SHB), leaving other potentially effective momentum schemes unexplored.
- Lack of Hyperparameter Scheduling: Proper tuning of hyperparameters like learning rates is critical in training deep models and ensuring optimal convergence, yet existing work often neglects dynamic scheduling.
- Neglect of System Heterogeneity: Assumptions of client homogeneity in computational capability and synchronicity are unrealistic in practical federated settings.
Federated General Momentum (FedGM)
To address these limitations, the paper introduces the Federated General Momentum (FedGM), a broader framework that generalizes momentum schemes by integrating various learning rates and decay factors. FedGM's key characteristics include:
- Support for different momentum schemes beyond SHB, such as Nesterov's accelerated gradient (NAG).
- Incorporation of non-constant learning rates and hyperparameter decay strategies akin to step decay schedules, popular in large-scale neural network training.
- Adaptation to client and system heterogeneity with minimal coordination, allowing asynchronous participation and varying client local computation intensities.
Multistage FedGM and Its Convergence
The novel aspect of multistage FedGM is the usage of a "constant and drop" hyperparameter scheduler, allowing for piecewise constant learning rates across different stages of training. This strategy is empirically validated to foster faster convergence compared to single-stage training with constant hyperparameters.
Theoretical analysis demonstrates that multistage FedGM achieves convergence under both full and partial client participation scenarios. It effectively adjusts stage lengths and hyperparameters to balance earlier exploration with later precision in convergence, accommodating the non-i.i.d. nature of data across clients.
Autonomous Multistage FedGM under System Heterogeneity
Autonomous Multistage FedGM extends this framework to accommodate asynchronous updates, addressing real-world challenges where clients may not participate uniformly, and their updates may not be synchronized. This approach evaluates convergence under more realistic conditions of client availability and computation lag, demonstrating robustness to variable system performance without significant loss of convergence speed.
Experimental Validation
Extensive experiments on models such as ResNet and VGG on the CIFAR-10 dataset underscore the efficacy of FedGM. Results show improved performance and stability over traditional FedAvg and existing momentum-based adaptations, particularly under conditions of client and data heterogeneity.
Conclusion
The paper's development of FedGM and its derivatives marks a significant step towards robust federated learning in heterogeneous environments. By addressing key limitations in current frameworks and providing rigorous theoretical and empirical analysis, it paves the way for future research on adaptable and scalable FL systems.