Papers
Topics
Authors
Recent
Search
2000 character limit reached

Performance Improvement of Federated Learning Server using Smart NIC

Published 13 Jul 2023 in cs.DC | (2307.06561v2)

Abstract: Federated learning is a distributed machine learning approach where local weight parameters trained by clients locally are aggregated as global parameters by a server. The global parameters can be trained without uploading privacy-sensitive raw data owned by clients to the server. The aggregation on the server is simply done by averaging the local weight parameters, so it is an I/O intensive task where a network processing accounts for a large portion compared to the computation. The network processing workload further increases as the number of clients increases. To mitigate the network processing workload, in this paper, the federated learning server is offloaded to NVIDIA BlueField-2 DPU which is a smart NIC (Network Interface Card) that has eight processing cores. Dedicated processing cores are assigned by DPDK (Data Plane Development Kit) for receiving the local weight parameters and sending the global parameters. The aggregation task is parallelized by exploiting multiple cores available on the DPU. To further improve the performance, an approximated design that eliminates an exclusive access control between the computation threads is also implemented. Evaluation results show that the proposed DPDK-based federated learning server on the DPU with the approximation accelerates the execution time by 1.39 times with a negligible accuracy loss compared with a baseline server on the host CPU.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (15)
  1. Communication-Efficient Learning of Deep Networks from Decentralized Data. In Proceedings of the International Conference on Artificial Intelligence and Statistics (AISTATS), pages 1273–1282, April 2017.
  2. NVIDIA BlueField-2 DPU. https://www.nvidia.com/content/dam/en-zz/ja/Solutions/Data-Center/documents/bluefield-2-dpu-datasheet-jp.pdf.
  3. Data Plane Development Kit. https://www.dpdk.org.
  4. Personalized Federated Learning with Theoretical Guarantees: A Model-Agnostic Meta-Learning Approach. In Proceedings of the International Conference on Neural Information Processing Systems (NeurIPS), pages 3557–3568, December 2020.
  5. Adaptive Personalized Federated Learning. arXiv:2003.13461, November 2020.
  6. F-Stack. http://www.f-stack.org.
  7. DPDK-ANS. https://github.com/ansyun/dpdk-ans.
  8. mTCP: a Highly Scalable User-level TCP Stack for Multicore Systems. In Proceedings of the USENIX Symposium on Networked Systems Design and Implementation (NSDI), pages 489–502, April 2014.
  9. ZygOS: Achieving Low Tail Latency for Microsecond-Scale Networked Tasks. In Proceedings of the ACM Symposium on Operating Systems Principles, pages 325–341, October 2017.
  10. Shenango: Achieving High CPU Efficiency for Latency-sensitive Datacenter Workloads. In Proceedings of the USENIX Symposium on Networked Systems Design and Implementation (NSDI), pages 361–378, February 2019.
  11. ONLAD-IDS: ONLAD-Based Intrusion Detection System Using SmartNIC. In Proceedings of the International Conference on High Performance Computing and Communications (HPCC), pages 546–553, December 2022.
  12. Offloading Distributed Applications onto SmartNICs Using IPipe. In Proceedings of the ACM Special Interest Group on Data Communication (SIGCOMM), pages 318–333, August 2019.
  13. Accelerating CPU-based Distributed DNN Training on Modern HPC Clusters using BlueField-2 DPUs. In Proceedings of the IEEE Symposium on High-Performance Interconnects (HOTI), pages 17–24, August 2021.
  14. A Novel Framework for Efficient Offloading of Communication Operations to Bluefield SmartNICs. In Proceedings of the IEEE International Parallel & Distributed Processing Symposium (IPDPS), pages 123–133, May 2023.
  15. Characterizing Off-path SmartNIC for Accelerating Distributed Systems. In Proceedings of the USENIX Symposium on Operating Systems Design and Implementation (OSDI), pages 987–1004, July 2023.
Citations (2)

Summary

No one has generated a summary of this paper yet.

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.