Papers
Topics
Authors
Recent
Search
2000 character limit reached

A Quantitative Analysis and Guidelines of Data Streaming Accelerator in Modern Intel Xeon Scalable Processors

Published 4 May 2023 in cs.AR and cs.PF | (2305.02480v5)

Abstract: As semiconductor power density is no longer constant with the technology process scaling down, modern CPUs are integrating capable data accelerators on chip, aiming to improve performance and efficiency for a wide range of applications and usages. One such accelerator is the Intel Data Streaming Accelerator (DSA) introduced in Intel 4th Generation Xeon Scalable CPUs (Sapphire Rapids). DSA targets data movement operations in memory that are common sources of overhead in datacenter workloads and infrastructure. In addition, it becomes much more versatile by supporting a wider range of operations on streaming data, such as CRC32 calculations, delta record creation/merging, and data integrity field (DIF) operations. This paper sets out to introduce the latest features supported by DSA, deep-dive into its versatility, and analyze its throughput benefits through a comprehensive evaluation. Along with the analysis of its characteristics, and the rich software ecosystem of DSA, we summarize several insights and guidelines for the programmer to make the most out of DSA, and use an in-depth case study of DPDK Vhost to demonstrate how these guidelines benefit a real application.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (66)
  1. A scalable processing-in-memory accelerator for parallel graph processing. In International Symposium on Computer Architecture, (ISCA’15), 2015.
  2. IDIO: Network-driven, inbound network data orchestration on server processors. In 55th IEEE/ACM International Symposium on Microarchitecture, (MICRO’22), 2022.
  3. Data direct I/O characterization for future I/O system exploration. In 2020 IEEE International Symposium on Performance Analysis of Systems and Software, (ISPASS’20), 2020.
  4. Assise: Performance and availability via client-local NVM in a distributed file system. In 14th USENIX Symposium on Operating Systems Design and Implementation, (OSDI’20), 2020.
  5. FlexLearn: Fast and highly efficient brain simulations using flexible on-chip learning. In IEEE/ACM International Symposium on Microarchitecture, (MICRO’19), 2019.
  6. The CacheLib caching engine: Design and experiences at scale. In Proceedings of the 14th USENIX Conference on Operating Systems Design and Implementation, (OSDI’20), 2020.
  7. DianNao: A small-footprint high-throughput accelerator for ubiquitous machine-learning. In International Conference on Architectural Support for Programming Languages and Operating Systems, (ASPLOS’14), 2014.
  8. Eyeriss: A spatial architecture for energy-efficient dataflow for convolutional neural networks. In International Symposium on Computer Architecture, (ISCA’16), 2016.
  9. DaDianNao: A Machine-Learning Supercomputer. In IEEE/ACM International Symposium on Microarchitecture, (MICRO’14), 2014.
  10. Intel Corporation. Dual-core Intel Xeon Processor 5100 Series. https://www.sas.com/partners/directory/intel/XeonProcessorProdBrief.pdf, 2006. (Accessed on 12/02/2022).
  11. Intel Corporation. Intel I/O Acceleration Technology. https://www.intel.com/content/www/us/en/wireless-network/accel-technology.html, 2006.
  12. Intel Corporation. White paper: Accelerating high-speed networking with Intel I/OAT. https://www.intel.com/content/www/us/en/io/i-o-acceleration-technology-paper.html, 2006.
  13. Intel Corporation. Intel QuickData Technology software guide for Linux. https://www.intel.com/content/dam/doc/white-paper/quickdata-technology-software-guide-for-linux-paper.pdf, 2008.
  14. Intel Corporation. SPDK: Introduction to the storage performance development kit (SPDK). https://www.intel.com/content/www/us/en/developer/articles/tool/introduction-to-the-storage-performance-development-kit-spdk.html, 2015.
  15. Intel Corporation. Github - intel/intel-cmt-cat: User space software for Intel Resource Director Technology. https://github.com/intel/intel-cmt-cat, 2016.
  16. Intel Corporation. Intel Data Direct I/O Technology. https://www.intel.com/content/www/us/en/io/data-direct-i-o-technology.html, 2018.
  17. Intel Corporation. Introducing Intel Scalable I/O Virtualization. https://www.intel.com/content/www/us/en/developer/articles/technical/introducing-intel-scalable-io-virtualization.html, 2018.
  18. Intel Corporation. Github - intel/idxd-config: Utility library for controlling and configuring DSA (Data-Streaming Accelerator) sub-system in the Linux kernel. https://github.com/intel/idxd-config, 2019.
  19. Intel Corporation. Introducing the Intel Data Streaming Accelerator (Intel DSA) | 01.org. https://01.org/blogs/2019/introducing-intel-data-streaming-accelerator, 2019.
  20. Intel Corporation. GNA - gaussian & neural accelerator library repository. https://github.com/intel/gna, 2022.
  21. Intel Corporation. Performance evolution of DAOS servers - a sneak preview of DAOS on 4th Gen Intel Xeon Scalable Processors, formerly codenamed Sapphire Rapids. https://www.intel.com/content/www/us/en/high-performance-computing/performance-evolution-of-daos-servers.html, 2022.
  22. Intel Corporation. Optimized real-time video transport using Intel Data Streaming Accelerator. https://networkbuilders.intel.com/solutionslibrary/optimized-real-time-video-transport-using-intel-data-streaming-accelerator, 2023.
  23. Intel Corporation. Intel® Agilex™ 7 FPGA I-Series Development Kit . https://www.intel.com/content/www/us/en/products/details/fpga/development-kits/agilex/i-series/dev-agi027.html, accessed in 2023.
  24. Intel Corporation. Intel® Data Mover Library (Intel® DML). https://github.com/intel/DML, accessed in 2023.
  25. Intel Corporation. Intel® Data Streaming Accelerator Architecture Specification. https://software.intel.com/en-us/download/intel-data-streaming-accelerator-preliminary-architecture-specification, accessed in 2023.
  26. Intel Corporation. Intel® Intelligent Storage Acceleration Library (Intel® ISA-L). https://github.com/intel/isa-l, accessed in 2023.
  27. Intel Corporation. Intel® Performance Counter Monitor (Intel® PCM). https://github.com/intel/pcm, accessed in 2023.
  28. CXL Consortium. Compute Express Link (CXL). https://www.computeexpresslink.org, accessed in 2021.
  29. Towards general purpose acceleration by exploiting common data-dependence forms. In 51st IEEE/ACM International Symposium on Microarchitecture, (MICRO’19), 2019.
  30. DAOS. Distributed asynchronous object storage (DAOS). https://www.intel.com/content/www/us/en/high-performance-computing/daos.html, 2019.
  31. Arnaldo Carvalho De Melo. The new Linux "perf" tools. In Slides from Linux Kongress, volume 18, pages 1–42, 2010.
  32. Dark silicon and the end of multicore scaling. In Proceedings of the 38th Annual International Symposium on Computer Architecture, (ISCA’11), 2011.
  33. Facebook. Cachebench: Benchmark and stress testing tool to evaluate cache performance with real hardware and real cache workloads. https://cachelib.org/docs/Cache_Library_User_Guides/Cachebench_Overview, 2020.
  34. Brice Goglin. Improving message passing over Ethernet with I/OAT copy offload in open-mx. In 2008 IEEE International Conference on Cluster Computing, (IEEE Cluster’08), 2008.
  35. X-mem: A cross-platform and extensible memory characterization tool for the cloud. In 2016 IEEE International Symposium on Performance Analysis of Systems and Software, (ISPASS’16), 2016.
  36. A brief introduction to the openfabrics interfaces - a new network API for maximizing high performance application efficiency. In 2015 IEEE 23rd Annual Symposium on High-Performance Interconnects, 2015.
  37. Graphicionado: A high-performance and energy-efficient accelerator for graph analytics. In IEEE/ACM International Symposium on Microarchitecture, (MICRO’16), 2016.
  38. Extensor: An accelerator for sparse tensor algebra. In IEEE/ACM International Symposium on Microarchitecture, (MICRO’19), 2019.
  39. Direct cache access for high bandwidth network I/O. In 32nd International Symposium on Computer Architecture, (ISCA’05), 2005.
  40. In-datacenter performance analysis of a Tensor Processing Unit. In 44th ACM/IEEE International Symposium on Computer Architecture, (ISCA’17), 2017.
  41. Challenges and solutions for fast remote persistent memory access. In 11th ACM Symposium on Cloud Computing, (SoCC’20), 2020.
  42. Profiling a warehouse-scale computer. In 42nd IEEE/ACM International Symposium on Computer Architecture, (ISCA’15), 2015.
  43. A hardware accelerator for protocol buffers. In 54th IEEE/ACM International Symposium on Microarchitecture, (MICRO’21), 2021.
  44. Network-Based Computing Laboratory. Osu micro-benchmarks 7.0. http://mvapich.cse.ohio-state.edu/benchmarks/, 2022. (Accessed on 01/28/2023).
  45. Master of none acceleration: A comparison of accelerator architectures for analytical query processing. In 47th ACM/IEEE International Symposium on Computer Architecture, (ISCA’19), 2019.
  46. TABLA: A unified template-based framework for accelerating statistical machine learning. In IEEE International Symposium on High-Performance Computer Architecture, (HPCA’16), 2016.
  47. MLCommons. Mlperf benchmark. https://mlcommons.org/en/training-normal-10/, 2022. (Accessed on 01/28/2023).
  48. Stream-dataflow Acceleration. In International Symposium on Computer Architecture, (ISCA’17), 2017.
  49. The Fast Data Project. Fd.io – the world’s secure networking data plane. https://fd.io.
  50. Github - intel/dsa-perf-micros: Intel® dsa performance micros. https://github.com/intel/dsa-perf-micros, 12 2022. (Accessed on 1/05/2023).
  51. HeMem: Scalable tiered memory management for big data applications and real NVM. In ACM SIGOPS 28th Symposium on Operating Systems Principles (SOSP’21), 2021.
  52. Debendra Das Sharma. Compute express linktm (cxltm): Enabling heterogeneous data-centric computing with heterogeneous memory hierarchy. IEEE Micro, pages 1–7, 2022.
  53. From high-level deep neural models to FPGAs. In 49th IEEE/ACM International Symposium on Microarchitecture, (MICRO’16), 2016.
  54. Revitalizing the forgotten on-chip DMA to expedite data movement in NVM-based storage systems. In Proceedings of the 21st USENIX Conference on File and Storage Technologies, (FAST’23), 2023.
  55. PipeDevice: A hardware-software co-design approach to intra-host container communication. In Proceedings of the 18th International Conference on Emerging Networking EXperiments and Technologies, (CoNEXT’22), 2022.
  56. Data movement accelerator engines on a prototype Power10 processor. IEEE Micro, 2022.
  57. ResQ: Enabling SLOs in network function virtualization. In 15th USENIX Symposium on Networked Systems Design and Implementation, (NSDI’18), 2018.
  58. Efficient asynchronous memory copy operations on multi-core systems and I/OAT. In 2007 IEEE International Conference on Cluster Computing, (IEEE Cluster’07), 2007.
  59. Designing efficient asynchronous memory operations using hardware copy engine: A case study with I/OAT. In 2007 IEEE International Parallel and Distributed Processing Symposium, (IPDPS’07), 2007.
  60. Benefits of I/O acceleration technology (I/OAT) in clusters. In 2007 IEEE International Symposium on Performance Analysis of Systems & Software, (ISPASS’07), 2007.
  61. WikiPedia. io_uring. https://en.wikipedia.org/wiki/Io_uring, 2022.
  62. Zi Yan. Accelerate page migration and use memcg for PMEM management. https://lwn.net/Articles/784925/, accessed in 2023.
  63. An evaluation of edge TPU accelerators for convolutional neural networks. arXiv preprint arXiv:2102.10423, 2021.
  64. Don’t forget the I/O when allocating your LLC. In 2021 ACM/IEEE 48th Annual International Symposium on Computer Architecture, (ISCA’21), 2021.
  65. The Kernel Development Community. Non-Transparent Bridge Drivers https://www.kernel.org/doc/html/latest/driver-api/ntb.html, accessed in 2023.
  66. DPDK DPDK: Vhost library https://doc.dpdk.org/guides/prog_guide/vhost_lib.html, accessed in 2023.
Citations (4)

Summary

Paper to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 1 tweet with 0 likes about this paper.