Byzantine-resilient Federated Learning With Adaptivity to Data Heterogeneity
Abstract: This paper deals with federated learning (FL) in the presence of malicious Byzantine attacks and data heterogeneity. A novel Robust Average Gradient Algorithm (RAGA) is proposed, which leverages the geometric median for aggregation and can freely select the round number for local updating. Different from most existing resilient approaches, which perform convergence analysis based on strongly-convex loss function or homogeneously distributed dataset, we conduct convergence analysis for not only strongly-convex but also non-convex loss function over heterogeneous dataset. According to our theoretical analysis, as long as the fraction of dataset from malicious users is less than half, RAGA can achieve convergence at rate $\mathcal{O}({1}/{T{2/3- \delta}})$ where $T$ is the iteration number and $\delta \in (0, 2/3)$ for non-convex loss function, and at linear rate for strongly-convex loss function. Moreover, stationary point or global optimal solution is proved to obtainable as data heterogeneity vanishes. Experimental results corroborate the robustness of RAGA to Byzantine attacks and verifies the advantage of RAGA over baselines on convergence performance under various intensity of Byzantine attacks, for heterogeneous dataset.
- L. Zhou, K.-H. Yeh, G. Hancke, Z. Liu, and C. Su, “Security and privacy for the industrial internet of things: An overview of approaches to safeguarding endpoints,” IEEE Signal Process. Mag., vol. 35, no. 5, pp. 76–87, 2018.
- H. B. McMahan, F. Yu, P. Richtarik, A. Suresh, and D. Bacon, “Federated learning: Strategies for improving communication efficiency,” Proc. Adv. Conf. Neural Inf. Process. Syst., pp. 5–10, 2016.
- T. Li, A. K. Sahu, M. Zaheer, M. Sanjabi, A. Talwalkar, and V. Smith, “Federated optimization in heterogeneous networks,” Proc. Mach. Learn. Syst., pp. 429–450, 2020.
- R. Agrawal and R. Srikant, “Privacy-preserving data mining,” Proc. ACM SIGMOD Int. Conf. Manage. Data, pp. 439–450, 2000.
- J. Duchi, M. J. Wainwright, and M. I. Jordan, “Local privacy and minimax bounds: Sharp rates for probability estimation,” Proc. Int. Conf. Neural Inf. Process. Syst., pp. 1529–1537, 2013.
- J. Konečnỳ, H. B. McMahan, D. Ramage, and P. Richtárik, “Federated optimization: Distributed machine learning for on-device intelligence,” arXiv preprint arXiv:1610.02527, 2016.
- Q. Li, Z. Wen, Z. Wu, S. Hu, N. Wang, Y. Li, X. Liu, and B. He, “A survey on federated learning systems: Vision, hype and reality for data privacy and protection,” IEEE Trans. Knowl. Data Eng., vol. 35, no. 4, pp. 3347 – 3366, 2021.
- S. Wang, T. Tuor, T. Salonidis, K. K. Leung, C. Makaya, T. He, and K. Chan, “Adaptive federated learning in resource constrained edge computing systems,” IEEE J. Sel. Areas Commun., vol. 37, no. 6, pp. 1205–1221, 2019.
- M. Chen, D. Gündüz, K. Huang, W. Saad, M. Bennis, A. V. Feljan, and H. V. Poor, “Distributed learning in wireless networks: Recent progress and future challenges,” IEEE J. Sel. Areas Commun., vol. 39, no. 12, pp. 3579–3605, 2021.
- A. Vempaty, L. Tong, and P. K. Varshney, “Distributed inference with byzantine data: State-of-the-art review on data falsification attacks,” IEEE Signal Process. Mag., vol. 30, no. 5, pp. 65–75, 2013.
- Z. Yang, A. Gang, and W. U. Bajwa, “Adversary-resilient distributed and decentralized statistical inference and machine learning: An overview of recent advances under the byzantine threat model,” IEEE Signal Process. Mag., vol. 37, no. 3, pp. 146–159, 2020.
- J. So, B. Güler, and A. S. Avestimehr, “Byzantine-resilient secure federated learning,” IEEE J. Sel. Areas Commun., vol. 39, no. 7, pp. 2168–2181, 2020.
- X. Cao and L. Lai, “Distributed gradient descent algorithm robust to an arbitrary number of byzantine attackers,” IEEE Trans. Signal Process., vol. 67, no. 22, pp. 5850–5864, 2019.
- Y. Chen, L. Su, and J. Xu, “Distributed statistical machine learning in adversarial settings: Byzantine gradient descent,” Proc. ACM Meas. Anal. Comput. Syst., pp. 1–25, 2017.
- X. Cao and L. Lai, “Distributed approximate newton’s method robust to byzantine attackers,” IEEE Trans. Signal Process., vol. 68, pp. 6011–6025, 2020.
- C. Xie, O. Koyejo, and I. Gupta, “Generalized byzantine-tolerant SGD,” arXiv preprint arXiv:1802.10116, 2018.
- Y. Zhao, M. Li, L. Lai, N. Suda, D. Civin, and V. Chandra, “Federated learning with non-iid data,” arXiv preprint arXiv:1806.00582, 2018.
- Z. Xie and S. Song, “FedKL: Tackling data heterogeneity in federated reinforcement learning by penalizing kl divergence,” IEEE J. Sel. Areas Commun., vol. 41, no. 4, pp. 1227–1242, 2023.
- D. Yin, Y. Chen, R. Kannan, and P. Bartlett, “Byzantine-robust distributed learning: Towards optimal statistical rates,” Proc. Int. Conf. Machine Learning, pp. 5650–5659, 2018.
- L. Su and J. Xu, “Securing distributed gradient descent in high dimensional statistical learning,” Proc. ACM Meas. Anal. Comput. Syst., no. 1, pp. 1–41, 2019.
- P. Blanchard, E. M. El Mhamdi, R. Guerraoui, and J. Stainer, “Machine learning with adversaries: Byzantine tolerant gradient descent,” Proc. 31st Int. Conf. Neural Inf. Process. Syst., pp. 118–128, 2017.
- X. Fan, Y. Wang, Y. Huo, and Z. Tian, “BEV-SGD: Best effort voting SGD against byzantine attacks for analog-aggregation-based federated learning over the air,” IEEE Internet Things J., vol. 9, no. 19, pp. 18946–18959, 2022.
- L. Li, W. Xu, T. Chen, G. B. Giannakis, and Q. Ling, “RSA: Byzantine-robust stochastic aggregation methods for distributed learning from heterogeneous datasets,” Proc. AAAI Conf. Artif. Intell., no. 01, pp. 1544–1551, 2019.
- B. Turan, C. A. Uribe, H.-T. Wai, and M. Alizadeh, “Robust distributed optimization with randomly corrupted gradients,” IEEE Trans. Signal Process., vol. 70, pp. 3484–3498, 2022.
- K. Pillutla, S. M. Kakade, and Z. Harchaoui, “Robust aggregation for federated learning,” IEEE Trans. Signal Process., vol. 70, pp. 1142–1154, 2022.
- Z. Wu, Q. Ling, T. Chen, and G. B. Giannakis, “Federated variance-reduced stochastic gradient descent with robustness to byzantine attacks,” IEEE Trans. Signal Process., vol. 68, pp. 4583–4596, 2020.
- H. Zhu and Q. Ling, “Byzantine-robust distributed learning with compression,” IEEE Trans. Signal Inf. Process. Netw., vol. 9, pp. 280 – 294, 2023.
- S. Minsker, “Geometric median and robust estimation in banach spaces,” Bernoulli, pp. 2308–2335, 2015.
- A. Defazio, F. Bach, and S. Lacoste-Julien, “SAGA: A fast incremental gradient method with support for non-strongly convex composite objectives,” Proc. Adv. Neural Inf. Process. Syst., pp. 1646–1654, 2014.
- M. Huang, D. Zhang, and K. Ji, “Achieving linear speedup in non-iid federated bilevel learning,” arXiv preprint arXiv:2302.05412, 2023.
- X. Li and P. Li, “Analysis of error feedback in federated non-convex optimization with biased compression,” arXiv preprint arXiv:2211.14292, 2022.
- K. Aftab, R. Hartley, and J. Trumpf, “Generalized weiszfeld algorithms for lq optimization,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 37, no. 4, pp. 728–745, 2014.
- Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, “Gradient-based learning applied to document recognition,” Proc. IEEE, vol. 86, no. 11, pp. 2278–2324, 1998.
- K. Yue, R. Jin, R. Pilgrim, C.-W. Wong, D. Baron, and H. Dai, “Neural tangent kernel empowered federated learning,” Proc. Int. Conf. Mach. Learn., pp. 25783–25803, 2022.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.