High Dimensional Distributed Gradient Descent with Arbitrary Number of Byzantine Attackers
Abstract: Robust distributed learning with Byzantine failures has attracted extensive research interests in recent years. However, most of existing methods suffer from curse of dimensionality, which is increasingly serious with the growing complexity of modern machine learning models. In this paper, we design a new method that is suitable for high dimensional problems, under arbitrary number of Byzantine attackers. The core of our design is a direct high dimensional semi-verified mean estimation method. Our idea is to identify a subspace first. The components of mean value perpendicular to this subspace can be estimated via gradient vectors uploaded from worker machines, while the components within this subspace are estimated using auxiliary dataset. We then use our new method as the aggregator of distributed learning problems. Our theoretical analysis shows that the new method has minimax optimal statistical rates. In particular, the dependence on dimensionality is significantly improved compared with previous works.
- B. McMahan, E. Moore, D. Ramage, S. Hampson, and B. A. y Arcas, “Communication-efficient learning of deep networks from decentralized data,” in Artificial intelligence and statistics, pp. 1273–1282, PMLR, 2017.
- C. Zhang, Y. Xie, H. Bai, B. Yu, W. Li, and Y. Gao, “A survey on federated learning,” Knowledge-Based Systems, vol. 216, p. 106775, 2021.
- P. Kairouz, H. B. McMahan, B. Avent, A. Bellet, M. Bennis, A. N. Bhagoji, K. Bonawitz, Z. Charles, G. Cormode, R. Cummings, et al., “Advances and open problems in federated learning,” Foundations and Trends® in Machine Learning, vol. 14, no. 1–2, pp. 1–210, 2021.
- L. Lyu, H. Yu, and Q. Yang, “Threats to federated learning: A survey,” arXiv preprint arXiv:2003.02133, 2020.
- E. Bagdasaryan, A. Veit, Y. Hua, D. Estrin, and V. Shmatikov, “How to backdoor federated learning,” in International conference on artificial intelligence and statistics, pp. 2938–2948, PMLR, 2020.
- A. N. Bhagoji, S. Chakraborty, P. Mittal, and S. Calo, “Analyzing federated learning through an adversarial lens,” in International Conference on Machine Learning, pp. 634–643, PMLR, 2019.
- M. Fang, X. Cao, J. Jia, and N. Gong, “Local model poisoning attacks to {{\{{Byzantine-Robust}}\}} federated learning,” in 29th USENIX security symposium (USENIX Security 20), pp. 1605–1622, 2020.
- G. Sun, Y. Cong, J. Dong, Q. Wang, L. Lyu, and J. Liu, “Data poisoning attacks on federated machine learning,” IEEE Internet of Things Journal, vol. 9, no. 13, pp. 11365–11375, 2021.
- X. Luo, Y. Wu, X. Xiao, and B. C. Ooi, “Feature inference attack on model predictions in vertical federated learning,” in 2021 IEEE 37th International Conference on Data Engineering (ICDE), pp. 181–192, IEEE, 2021.
- Z. Qin, F. Chen, C. Zhi, X. Yan, and S. Deng, “Resisting backdoor attacks in federated learning via bidirectional elections and individual perspective,” arXiv preprint arXiv:2309.16456, 2023.
- L. Lamport, R. Shostak, and M. Pease, “The byzantine generals problem,” ACM Transactions on Programming Languages and Systems, vol. 4, no. 3, pp. 382–401, 1982.
- P. Blanchard, E. M. El Mhamdi, R. Guerraoui, and J. Stainer, “Machine learning with adversaries: Byzantine tolerant gradient descent,” in Advances in neural information processing systems, vol. 30, 2017.
- Y. Chen, L. Su, and J. Xu, “Distributed statistical machine learning in adversarial settings: Byzantine gradient descent,” Proceedings of the ACM on Measurement and Analysis of Computing Systems, vol. 1, no. 2, pp. 1–25, 2017.
- D. Yin, Y. Chen, R. Kannan, and P. Bartlett, “Byzantine-robust distributed learning: Towards optimal statistical rates,” in International Conference on Machine Learning, pp. 5650–5659, PMLR, 2018.
- X. Cao and L. Lai, “Distributed gradient descent algorithm robust to an arbitrary number of byzantine attackers,” IEEE Transactions on Signal Processing, vol. 67, no. 22, pp. 5850–5864, 2019.
- J. Regatti, H. Chen, and A. Gupta, “Bygars: Byzantine sgd with arbitrary number of attackers,” arXiv preprint arXiv:2006.13421, 2020.
- C. Xie, S. Koyejo, and I. Gupta, “Zeno: Distributed stochastic gradient descent with suspicion-based fault-tolerance,” in International Conference on Machine Learning, pp. 6893–6901, PMLR, 2019.
- C. Xie, S. Koyejo, and I. Gupta, “Zeno++: Robust fully asynchronous sgd,” in International Conference on Machine Learning, pp. 10495–10503, PMLR, 2020.
- B. Zhu, L. Wang, Q. Pang, S. Wang, J. Jiao, D. Song, and M. I. Jordan, “Byzantine-robust federated learning with optimal statistical rates,” in International Conference on Artificial Intelligence and Statistics, pp. 3151–3178, PMLR, 2023.
- I. Diakonikolas, G. Kamath, D. M. Kane, J. Li, A. Moitra, and A. Stewart, “Robust estimators in high dimensions without the computational intractability,” in 57th Annual Symposium on Foundations of Computer Science, pp. 655–664, Institute of Electrical and Electronics Engineers (IEEE), 2016.
- I. Diakonikolas, G. Kamath, D. M. Kane, J. Li, A. Moitra, and A. Stewart, “Being robust (in high dimensions) can be practical,” in International Conference on Machine Learning, pp. 999–1008, PMLR, 2017.
- I. Diakonikolas, D. M. Kane, and A. Pensia, “Outlier robust mean estimation with subgaussian rates via stability,” in Advances in Neural Information Processing Systems, pp. 1830–1840, Dec 2020.
- J. Steinhardt, Robust learning: Information theory and algorithms. PhD thesis, 2018.
- B. Zhu, J. Jiao, and J. Steinhardt, “Generalized resilience and robust statistics,” The Annals of Statistics, vol. 50, no. 4, pp. 2256–2283, 2022.
- B. Zhu, J. Jiao, and J. Steinhardt, “Robust estimation via generalized quasi-gradients,” Information and Inference: A Journal of the IMA, vol. 11, no. 2, pp. 581–636, 2022.
- B. Zhu, J. Jiao, and M. I. Jordan, “Robust estimation for non-parametric families via generative adversarial networks,” in 2022 IEEE International Symposium on Information Theory (ISIT), pp. 1100–1105, IEEE, 2022.
- I. Diakonikolas and D. M. Kane, Algorithmic high-dimensional robust statistics. Cambridge University Press, 2023.
- M. Charikar, J. Steinhardt, and G. Valiant, “Learning from untrusted data,” in Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing, pp. 47–60, 2017.
- I. Diakonikolas, D. Kane, D. Kongsgaard, J. Li, and K. Tian, “List-decodable mean estimation in nearly-pca time,” in Advances in Neural Information Processing Systems, vol. 34, pp. 10195–10208, 2021.
- P. J. Huber, “Robust estimation of a location parameter,” The Annals of Mathematical Statistics, pp. 73–101, 1964.
- John Wiley & Sons, 2004.
- J. W. Tukey, “Mathematics and the picturing of data,” in Proceedings of the International Congress of Mathematicians, Vancouver, 1975, vol. 2, pp. 523–531, 1975.
- K. A. Lai, A. B. Rao, and S. Vempala, “Agnostic estimation of mean and covariance,” in 2016 IEEE 57th Annual Symposium on Foundations of Computer Science (FOCS), pp. 665–674, IEEE, 2016.
- Y. Cheng, I. Diakonikolas, and R. Ge, “High-dimensional robust mean estimation in nearly-linear time,” in Proceedings of the thirtieth annual ACM-SIAM symposium on discrete algorithms, pp. 2755–2771, SIAM, 2019.
- S. Balakrishnan, S. S. Du, J. Li, and A. Singh, “Computationally efficient robust sparse estimation in high dimensions,” in Conference on Learning Theory, pp. 169–212, PMLR, 2017.
- S. B. Hopkins and J. Li, “Mixture models, robustness, and sum of squares proofs,” in Proceedings of the 50th Annual ACM SIGACT Symposium on Theory of Computing, pp. 1021–1034, 2018.
- I. Diakonikolas, D. M. Kane, and A. Stewart, “List-decodable robust mean estimation and learning mixtures of spherical gaussians,” in Proceedings of the 50th Annual ACM SIGACT Symposium on Theory of Computing, pp. 1047–1060, 2018.
- Y. Cherapanamjeri, S. Mohanty, and M. Yau, “List decodable mean estimation in nearly linear time,” in 2020 IEEE 61st Annual Symposium on Foundations of Computer Science (FOCS), pp. 141–148, IEEE, 2020.
- P. Zhao, F. Yu, and Z. Wan, “A huber loss minimization approach to byzantine robust federated learning,” arXiv preprint arXiv:2308.12581, 2023.
- L. Li, W. Xu, T. Chen, G. B. Giannakis, and Q. Ling, “Rsa: Byzantine-robust stochastic aggregation methods for distributed learning from heterogeneous datasets,” in Proceedings of the AAAI conference on artificial intelligence, vol. 33, pp. 1544–1551, 2019.
- S. Zuo, X. Yan, R. Fan, H. Hu, H. Shan, and T. Q. Quek, “Byzantine-resilient federated learning with adaptivity to data heterogeneity,” arXiv preprint arXiv:2403.13374, 2024.
- I. Diakonikolas, D. Kane, S. Karmalkar, A. Pensia, and T. Pittas, “List-decodable sparse mean estimation via difference-of-pairs filtering,” Advances in Neural Information Processing Systems, vol. 35, pp. 13947–13960, 2022.
- P. Raghavendra and M. Yau, “List decodable learning via sum of squares,” in Proceedings of the Fourteenth Annual ACM-SIAM Symposium on Discrete Algorithms, pp. 161–180, SIAM, 2020.
- S. Zeng and J. Shen, “List-decodable sparse mean estimation,” Advances in Neural Information Processing Systems, vol. 35, pp. 24031–24045, 2022.
- Y. LeCun, “The mnist database of handwritten digits,” http://yann. lecun. com/exdb/mnist/, 1998.
- J. A. Tropp et al., “An introduction to matrix concentration inequalities,” Foundations and Trends® in Machine Learning, vol. 8, no. 1-2, pp. 1–230, 2015.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.