Resilient Two-Time-Scale Local Stochastic Gradient Descent for Byzantine Federated Learning

Published 4 Sep 2024 in math.OC | (2409.03092v1)

Abstract: We study local stochastic gradient descent methods for solving federated optimization over a network of agents communicating indirectly through a centralized coordinator. We are interested in the Byzantine setting where there is a subset of $f$ malicious agents that could observe the entire network and send arbitrary values to the coordinator to disrupt the performance of other non-faulty agents. The objective of the non-faulty agents is to collaboratively compute the optimizer of their respective local functions under the presence of Byzantine agents. In this setting, prior works show that the local stochastic gradient descent method can only return an approximate of the desired solutions due to the impacts of Byzantine agents. Whether this method can find an exact solution remains an open question. In this paper, we will address this open question by proposing a new variant of the local stochastic gradient descent method. Under similar conditions that are considered in the existing works, we will show that the proposed method converges exactly to the desired solutions. We will provide theoretical results to characterize the convergence properties of our method, in particular, the proposed method convergences at an optimal rate $\mathcal{O}(1/k)$ in both strongly convex and non-convex settings, where $k$ is the number of iterations. Finally, we will present a number of simulations to illustrate our theoretical results.