NeuraChip: Accelerating GNN Computations with a Hash-based Decoupled Spatial Accelerator
Abstract: Graph Neural Networks (GNNs) are emerging as a formidable tool for processing non-euclidean data across various domains, ranging from social network analysis to bioinformatics. Despite their effectiveness, their adoption has not been pervasive because of scalability challenges associated with large-scale graph datasets, particularly when leveraging message passing. To tackle these challenges, we introduce NeuraChip, a novel GNN spatial accelerator based on Gustavson's algorithm. NeuraChip decouples the multiplication and addition computations in sparse matrix multiplication. This separation allows for independent exploitation of their unique data dependencies, facilitating efficient resource allocation. We introduce a rolling eviction strategy to mitigate data idling in on-chip memory as well as address the prevalent issue of memory bloat in sparse graph computations. Furthermore, the compute resource load balancing is achieved through a dynamic reseeding hash-based mapping, ensuring uniform utilization of computing resources agnostic of sparsity patterns. Finally, we present NeuraSim, an open-source, cycle-accurate, multi-threaded, modular simulator for comprehensive performance analysis. Overall, NeuraChip presents a significant improvement, yielding an average speedup of 22.1x over Intel's MKL, 17.1x over NVIDIA's cuSPARSE, 16.7x over AMD's hipSPARSE, and 1.5x over prior state-of-the-art SpGEMM accelerator and 1.3x over GNN accelerator. The source code for our open-sourced simulator and performance visualizer is publicly accessible on GitHub https://neurachip.us
- Implementation and analysis of a new selection strategy for adaptive routing in networks-on-chip. IEEE transactions on computers 57, 6 (2008), 809–820.
- Exploiting multiple levels of parallelism in sparse matrix-matrix multiplication. SIAM Journal on Scientific Computing 38, 6 (2016), C624–C651.
- Innersp: A memory efficient sparse matrix multiplication accelerator with locality-aware inner product processing. In 2021 30th International Conference on Parallel Architectures and Compilation Techniques (PACT). IEEE, 116–128.
- A power and performance model for network-on-chip architectures. In Proceedings Design, Automation and Test in Europe Conference and Exhibition, Vol. 2. IEEE, 1250–1255.
- Gnnmark: A benchmark suite to characterize graph neural network training on gpus. In 2021 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS). IEEE, 13–23.
- A novel prime numbers based hashing technique for minimizing collisions. In 2016 2nd International Conference on Next Generation Computing Technologies (NGCT). IEEE, 522–527.
- Strong accumulators from collision-resistant hashing. In Information Security: 11th International Conference, ISC 2008, Taipei, Taiwan, September 15-18, 2008. Proceedings 11. Springer, 471–486.
- Performance of hashing-based schemes for internet load balancing. In Proceedings IEEE INFOCOM 2000. Conference on Computer Communications. Nineteenth Annual Joint Conference of the IEEE Computer and Communications Societies (Cat. No. 00CH37064), Vol. 1. IEEE, 332–341.
- Lianhua Chi and Xingquan Zhu. 2017. Hashing techniques: A survey and taxonomy. ACM Computing Surveys (Csur) 50, 1 (2017), 1–36.
- ASAP7: A 7-nm finFET predictive process design kit. Microelectronics Journal 53 (2016), 105–115.
- Cusp: Generic Parallel Algorithms for Sparse Matrix and Graph Computations. http://cusplibrary.github.io/ Version 0.5.0.
- I-GCN: A graph convolutional network accelerator with runtime locality enhancement through islandization. In MICRO-54: 54th annual IEEE/ACM international symposium on microarchitecture. 1051–1063.
- Sparse matrices in MATLAB: Design and implementation. SIAM J. Matrix Anal. Appl. 13, 1 (1992), 333–356.
- Single hash: Use one hash function to build faster hash based data structures. In 2018 IEEE international conference on big data and smart computing (BigComp). IEEE, 278–285.
- Fred G. Gustavson. 1978. Two Fast Algorithms for Sparse Matrices: Multiplication and Permuted Transposition. ACM Trans. Math. Softw. 4, 3 (sep 1978), 250–269. https://doi.org/10.1145/355791.355796
- William L Hamilton. 2020. Graph representation learning. Morgan & Claypool Publishers.
- GROW: A Row-Stationary Sparse-Dense GEMM Accelerator for Memory-Efficient Graph Convolutional Neural Networks. In 2023 IEEE International Symposium on High-Performance Computer Architecture (HPCA). IEEE, 42–55.
- JAXED: Reverse Engineering DNN Architectures Leveraging JIT GEMM Libraries. In 2021 International Symposium on Secure and Private Execution Environment Design (SEED). 189–202. https://doi.org/10.1109/SEED51797.2021.00030
- Scalability Limitations of Processing-in-Memory using Real System Evaluations. Proceedings of the ACM on Measurement and Analysis of Computing Systems 8, 1 (2024), 1–28.
- Thomas N Kipf and Max Welling. 2016. Semi-supervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907 (2016).
- Jure Leskovec and Andrej Krevl. 2014. SNAP Datasets: Stanford Large Network Dataset Collection. http://snap.stanford.edu/data.
- DRAMsim3: A cycle-accurate, thermal-capable DRAM simulator. IEEE Computer Architecture Letters 19, 2 (2020), 106–109.
- Spada: Accelerating Sparse Matrix Multiplication with Adaptive Dataflow. In Proceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS) (Vancouver, BC, Canada) (ASPLOS ’23). Association for Computing Machinery, New York, NY, USA, 747–761. https://doi.org/10.1145/3575693.3575706
- Lisa: Graph neural network based portable mapping on spatial accelerators. In 2022 IEEE International Symposium on High-Performance Computer Architecture (HPCA). IEEE, 444–459.
- Engn: A high-throughput and energy-efficient accelerator for large graph neural networks. IEEE Trans. Comput. 70, 9 (2020), 1511–1525.
- Accelerating Finite Field Arithmetic for Homomorphic Encryption on GPUs. IEEE Micro 43, 5 (2023), 55–63. https://doi.org/10.1109/MM.2023.3253052
- STIFT: A Spatio-Temporal Integrated Folding Tree for Efficient Reductions in Flexible DNN Accelerators. J. Emerg. Technol. Comput. Syst. 19, 4, Article 32 (sep 2023), 20Â pages. https://doi.org/10.1145/3531011
- High-performance and memory-saving sparse general matrix-matrix multiplication for nvidia pascal gpu. In 2017 46th International Conference on Parallel Processing (ICPP). IEEE, Piscataway, NJ, 101–110.
- Cusparse library. In GPU Technology Conference.
- Two-phase mapping hashing. Neurocomputing 151 (2015), 1423–1429.
- Outerspace: An outer product based sparse matrix multiplication accelerator. In 2018 IEEE International Symposium on High Performance Computer Architecture (HPCA). IEEE, 724–736.
- Maxk-gnn: Towards theoretical speed limits for accelerating graph neural networks training. arXiv preprint arXiv:2312.08656 (2023).
- Sigma: A sparse and irregular gemm accelerator with flexible interconnects for dnn training. In 2020 IEEE International Symposium on High Performance Computer Architecture (HPCA). IEEE, Piscataway, NJ, 58–70.
- ROCmSoftwarePlatform. [n. d.]. Rocmsoftwareplatform/hipSPARSE: Rocm Sparse Marshalling Library. https://github.com/ROCmSoftwarePlatform/hipSPARSE
- The structural simulation toolkit. ACM SIGMETRICS Performance Evaluation Review 38, 4 (2011), 37–42.
- FlowGNN: A Dataflow Architecture for Real-Time Workload-Agnostic Graph Neural Network Inference. In 2023 IEEE International Symposium on High-Performance Computer Architecture (HPCA). IEEE, 1099–1112.
- Kaustubh Shivdikar. 2021. SMASH: Sparse matrix atomic scratchpad hashing. Ph. D. Dissertation. Northeastern University.
- Kaustubh Shivdikar. 2023. Enabling Accelerators for Graph Computing. arXiv preprint arXiv:2312.10561 (2023).
- Gme: Gpu-based microarchitectural extensions to accelerate homomorphic encryption. In Proceedings of the 56th Annual IEEE/ACM International Symposium on Microarchitecture. 670–684.
- Accelerating polynomial multiplication for homomorphic encryption on GPUs. In 2022 IEEE International Symposium on Secure and Private Execution Environment Design (SEED). IEEE, 61–72.
- Speeding up dnns using hpl based fine-grained tiling for distributed multi-gpu training.
- Aaron Smith. 2020. What people like and dislike about Facebook. https://www.pewresearch.org/fact-tank/2014/02/03/what-people-like-dislike-about-facebook/
- Design of new hash mapping functions. In 2009 Ninth IEEE International Conference on Computer and Information Technology, Vol. 1. IEEE, 45–50.
- Dynamic power allocation using Stackelberg game in a wireless sensor network. In 2016 IEEE Aerospace Conference. IEEE, 1–10.
- Matraptor: A sparse-sparse matrix multiplication accelerator based on row-wise product. In 2020 53rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO). IEEE, 766–780.
- Adaptive load balancing content address hashing routing for reverse proxy servers. In 2004 IEEE International Conference on Communications (IEEE Cat. No. 04CH37577), Vol. 3. IEEE, 1522–1526.
- Intel math kernel library. High-Performance Computing on the Intel® Xeon Phi™: How to Fully Exploit MIC Architectures (2014), 167–188.
- How powerful are graph neural networks? arXiv preprint arXiv:1810.00826 (2018).
- Hygcn: A gcn accelerator with hybrid architecture. In 2020 IEEE International Symposium on High Performance Computer Architecture (HPCA). IEEE, 15–29.
- Gamma: Leveraging Gustavson’s Algorithm to Accelerate Sparse Matrix Multiplication. In Proceedings of the 26th ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS) (Virtual, USA) (ASPLOS ’21). Association for Computing Machinery, New York, NY, USA, 687–701. https://doi.org/10.1145/3445814.3446702
- Fast and efficient short read mapping based on a succinct hash index. BMC bioinformatics 19 (2018), 1–14.
- Sparch: Efficient architecture for sparse matrix multiplication. In 2020 IEEE International Symposium on High Performance Computer Architecture (HPCA). IEEE, 261–274.
- Graph neural networks: A review of methods and applications. AI open 1 (2020), 57–81.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.