Papers
Topics
Authors
Recent
Search
2000 character limit reached

HALO-CAT: A Hidden Network Processor with Activation-Localized CIM Architecture and Layer-Penetrative Tiling

Published 11 Dec 2023 in cs.AR | (2312.06086v1)

Abstract: To address the 'memory wall' problem in NN hardware acceleration, we introduce HALO-CAT, a software-hardware co-design optimized for Hidden Neural Network (HNN) processing. HALO-CAT integrates Layer-Penetrative Tiling (LPT) for algorithmic efficiency, reducing intermediate result sizes. Furthermore, the architecture employs an activation-localized computing-in-memory approach to minimize data movement. This design significantly enhances energy efficiency, achieving a 14.2x reduction in activation memory capacity and a 17.8x decrease in energy consumption, with only a 1.5% loss in accuracy, compared to traditional HNN processors.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (18)
  1. Manoj Alwani, et al. 2016. Fused-layer CNN accelerators. MICRO (2016), 1–12.
  2. Tianshi Chen, et al. 2014. DianNao: a small-footprint high-throughput accelerator for ubiquitous machine-learning. ASPLOS (2014).
  3. Jonathan Frankle et al. 2018. The lottery ticket hypothesis: Finding sparse, trainable neural networks. arXiv preprint arXiv:1803.03635 (2018).
  4. Koen Goetschalckx et al. 2019. Breaking High-Resolution CNN Bandwidth Barriers With Enhanced Depth-First Execution. JETCAS 9 (2019), 323–331.
  5. Koen Goetschalckx, et al. 2023. DepFiN: A 12-nm Depth-First, High-Resolution CNN Processor for IO-Efficient Inference. JSSC 58 (2023), 1425–1435.
  6. Kazutoshi Hirose, et al. 2022. Hiddenite: 4K-PE hidden network inference 4D-tensor engine exploiting on-chip model construction achieving 34.8-to-16.0 TOPS/W for CIFAR-100 and ImageNet. In ISSCC, Vol. 65. IEEE, 1–3.
  7. Pouya Houshmand, et al. 2022. Diana: An end-to-end hybrid digital and analog neural network soc for the edge. JSSC 58, 1 (2022), 203–215.
  8. Yuhao Ju, et al. 2023. A General-Purpose Compute-in-Memory Processor Combining CPU and Deep Learning with Elevated CPU Efficiency and Enhanced Data Locality. In VLSI Symp. IEEE, 1–2.
  9. Ji-Hoon Kim, et al. 2021. Z-PIM: A sparsity-aware processing-in-memory architecture with fully variable weight bit-precision for energy-efficient deep neural networks. JSSC 56, 4 (2021), 1093–1104.
  10. Gang Li, et al. 2021. Block convolution: toward memory-efficient inference of large-scale CNNs on FPGA. IEEE TCAD 41, 5 (2021), 1436–1447.
  11. Linyan Mei, et al. 2022. DeFiNES: Enabling Fast Exploration of the Depth-first Scheduling Space for DNN Accelerators through Analytical Modeling. HPCA (2022), 570–583.
  12. Yasuyuki Okoshi, et al. 2022. Multicoated supermasks enhance hidden networks. In Proc. Int. Conf. Mach. Learn. 17045–17055.
  13. Vivek Ramanujan, et al. 2020. What’s hidden in a randomly weighted neural network?. In CVPR. 11893–11902.
  14. Xuan S. Yang, et al. 2018. Interstellar: Using Halide’s Scheduling Language to Analyze DNN Accelerators. ASPLOS (2018).
  15. Chun-Yen Yao, et al. 2023. A Fully Bit-Flexible Computation in Memory Macro Using Multi-Functional Computing Bit Cell and Embedded Input Sparsity Sensing. JSSC (2023).
  16. Kentaro Yoshioka. 2023. An 818-TOPS/W CSNR-31dB SQNR-45dB 10-bit Capacitor-Reconfiguring Computing-in-Memory Macro with Software-Analog Co-Design for Transformers. arXiv preprint arXiv:2302.06463 (2023).
  17. Bo Zhang, et al. 2023. PIMCA: A Programmable In-Memory Computing Accelerator for Energy-Efficient DNN Inference. JSSC 58, 5 (2023), 1436–1449.
  18. Yang Zhao, et al. 2020. SmartExchange: Trading Higher-cost Memory Storage/Access for Lower-cost Computation. ISCA (2020), 954–967.
Citations (1)

Summary

No one has generated a summary of this paper yet.

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.