Papers
Topics
Authors
Recent
Search
2000 character limit reached

Systolic Array Data Flows for Efficient Matrix Multiplication in Deep Neural Networks

Published 29 Oct 2024 in cs.AR | (2410.22595v1)

Abstract: The paper discusses how Systolic Arrays can improve matrix multiplication for deep neural networks (DNNs). With AI models like OpenAI's GPT now containing trillions of parameters, the need for efficient matrix multiplication is more critical than ever. In this paper, the three main systolic array data flows: Weight Stationary (WS), Input Stationary (IS), and Output Stationary (OS) are discussed. Each data flow's energy consumption and efficiency across various matrix sizes are calculated using the SCALE-Sim simulator. The results show that selecting the right data flow for specific matrix configurations can drastically reduce energy consumption. The conclusions provide helpful insights into optimizing hardware for AI and machine learning applications, offering potential improvements in designing energy-efficient DNN accelerators.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (11)
  1. I. Arikpo, F. Ogban, and I. Eteng, “Von neumann architecture and modern computers,” Global Journal of Mathematical Sciences, vol. 6, no. 2, pp. 97–103, 2007.
  2. C. Batten, N. Pinckney, M. Liu, H. Ren, and B. Khailany, “Pyhdl-eval: An llm evaluation framework for hardware design using python-embedded dsls,” in Proceedings of the 2024 ACM/IEEE International Symposium on Machine Learning for CAD, ser. MLCAD ’24.   New York, NY, USA: Association for Computing Machinery, 2024. [Online]. Available: https://doi.org/10.1145/3670474.3685948
  3. M. Capra, B. Bussolino, A. Marchisio, G. Masera, M. Martina, and M. Shafique, “Hardware and software optimizations for accelerating deep neural networks: Survey of current trends, challenges, and the road ahead,” IEEE Access, vol. 8, pp. 225 134–225 180, 2020.
  4. X. He, S. Pal, A. Amarnath, S. Feng, D.-H. Park, A. Rovinski, H. Ye, Y. Chen, R. Dreslinski, and T. Mudge, “Sparse-tpu: adapting systolic arrays for sparse matrices,” in Proceedings of the 34th ACM International Conference on Supercomputing, ser. ICS ’20.   New York, NY, USA: Association for Computing Machinery, 2020. [Online]. Available: https://doi.org/10.1145/3392717.3392751
  5. N. P. Jouppi, D. H. Yoon, G. Kurian, S. Li, N. Patil, J. Laudon, C. Young, and D. Patterson, “A domain-specific supercomputer for training deep neural networks,” Commun. ACM, vol. 63, no. 7, p. 67–78, Jun. 2020. [Online]. Available: https://doi.org/10.1145/3360307
  6. N. P. Jouppi, C. Young, N. Patil, D. Patterson, G. Agrawal, R. Bajwa, S. Bates, S. Bhatia, N. Boden, A. Borchers, R. Boyle, P.-l. Cantin, C. Chao, C. Clark, J. Coriell, M. Daley, M. Dau, J. Dean, B. Gelb, T. V. Ghaemmaghami, R. Gottipati, W. Gulland, R. Hagmann, C. R. Ho, D. Hogberg, J. Hu, R. Hundt, D. Hurt, J. Ibarz, A. Jaffey, A. Jaworski, A. Kaplan, H. Khaitan, D. Killebrew, A. Koch, N. Kumar, S. Lacy, J. Laudon, J. Law, D. Le, C. Leary, Z. Liu, K. Lucke, A. Lundin, G. MacKean, A. Maggiore, M. Mahony, K. Miller, R. Nagarajan, R. Narayanaswami, R. Ni, K. Nix, T. Norrie, M. Omernick, N. Penukonda, A. Phelps, J. Ross, M. Ross, A. Salek, E. Samadiani, C. Severn, G. Sizikov, M. Snelham, J. Souter, D. Steinberg, A. Swing, M. Tan, G. Thorson, B. Tian, H. Toma, E. Tuttle, V. Vasudevan, R. Walter, W. Wang, E. Wilcox, and D. H. Yoon, “In-datacenter performance analysis of a tensor processing unit,” in Proceedings of the 44th Annual International Symposium on Computer Architecture, ser. ISCA ’17.   New York, NY, USA: Association for Computing Machinery, 2017, p. 1–12. [Online]. Available: https://doi.org/10.1145/3079856.3080246
  7. D. C. Jung, M. Ruttenberg, P. Gao, S. Davidson, D. Petrisko, K. Li, A. K. Kamath, L. Cheng, S. Xie, P. Pan, Z. Zhao, Z. Yue, B. Veluri, S. Muralitharan, A. Sampson, A. Lumsdaine, Z. Zhang, C. Batten, M. Oskin, D. Richmond, and M. B. Taylor, “Scalable, programmable and dense: The hammerblade open-source risc-v manycore,” in 2024 ACM/IEEE 51st Annual International Symposium on Computer Architecture (ISCA), 2024, pp. 770–784.
  8. S. J. Park, “An analysis of gpu parallel computing,” in 2009 DoD High Performance Computing Modernization Program Users Group Conference, 2009, pp. 365–369.
  9. A. Samajdar, J. M. Joseph, Y. Zhu, P. Whatmough, M. Mattina, and T. Krishna, “A systematic methodology for characterizing scalability of dnn accelerators using scale-sim,” in 2020 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS), 2020, pp. 58–68.
  10. T. N. Theis and H.-S. P. Wong, “The end of moore’s law: A new beginning for information technology,” Computing in Science & Engineering, vol. 19, no. 2, pp. 41–50, 2017.
  11. A. Yosifova. (2023) The evolution of chatgpt: History and future. [Online]. Available: https://365datascience.com/trending/the-evolution-of-chatgpt-history-and-future/

Summary

No one has generated a summary of this paper yet.

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Authors (1)

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 1 tweet with 1 like about this paper.