HALO-CAT: A Hidden Network Processor with Activation-Localized CIM Architecture and Layer-Penetrative Tiling
Abstract: To address the 'memory wall' problem in NN hardware acceleration, we introduce HALO-CAT, a software-hardware co-design optimized for Hidden Neural Network (HNN) processing. HALO-CAT integrates Layer-Penetrative Tiling (LPT) for algorithmic efficiency, reducing intermediate result sizes. Furthermore, the architecture employs an activation-localized computing-in-memory approach to minimize data movement. This design significantly enhances energy efficiency, achieving a 14.2x reduction in activation memory capacity and a 17.8x decrease in energy consumption, with only a 1.5% loss in accuracy, compared to traditional HNN processors.
- Manoj Alwani, et al. 2016. Fused-layer CNN accelerators. MICRO (2016), 1–12.
- Tianshi Chen, et al. 2014. DianNao: a small-footprint high-throughput accelerator for ubiquitous machine-learning. ASPLOS (2014).
- Jonathan Frankle et al. 2018. The lottery ticket hypothesis: Finding sparse, trainable neural networks. arXiv preprint arXiv:1803.03635 (2018).
- Koen Goetschalckx et al. 2019. Breaking High-Resolution CNN Bandwidth Barriers With Enhanced Depth-First Execution. JETCAS 9 (2019), 323–331.
- Koen Goetschalckx, et al. 2023. DepFiN: A 12-nm Depth-First, High-Resolution CNN Processor for IO-Efficient Inference. JSSC 58 (2023), 1425–1435.
- Kazutoshi Hirose, et al. 2022. Hiddenite: 4K-PE hidden network inference 4D-tensor engine exploiting on-chip model construction achieving 34.8-to-16.0 TOPS/W for CIFAR-100 and ImageNet. In ISSCC, Vol. 65. IEEE, 1–3.
- Pouya Houshmand, et al. 2022. Diana: An end-to-end hybrid digital and analog neural network soc for the edge. JSSC 58, 1 (2022), 203–215.
- Yuhao Ju, et al. 2023. A General-Purpose Compute-in-Memory Processor Combining CPU and Deep Learning with Elevated CPU Efficiency and Enhanced Data Locality. In VLSI Symp. IEEE, 1–2.
- Ji-Hoon Kim, et al. 2021. Z-PIM: A sparsity-aware processing-in-memory architecture with fully variable weight bit-precision for energy-efficient deep neural networks. JSSC 56, 4 (2021), 1093–1104.
- Gang Li, et al. 2021. Block convolution: toward memory-efficient inference of large-scale CNNs on FPGA. IEEE TCAD 41, 5 (2021), 1436–1447.
- Linyan Mei, et al. 2022. DeFiNES: Enabling Fast Exploration of the Depth-first Scheduling Space for DNN Accelerators through Analytical Modeling. HPCA (2022), 570–583.
- Yasuyuki Okoshi, et al. 2022. Multicoated supermasks enhance hidden networks. In Proc. Int. Conf. Mach. Learn. 17045–17055.
- Vivek Ramanujan, et al. 2020. What’s hidden in a randomly weighted neural network?. In CVPR. 11893–11902.
- Xuan S. Yang, et al. 2018. Interstellar: Using Halide’s Scheduling Language to Analyze DNN Accelerators. ASPLOS (2018).
- Chun-Yen Yao, et al. 2023. A Fully Bit-Flexible Computation in Memory Macro Using Multi-Functional Computing Bit Cell and Embedded Input Sparsity Sensing. JSSC (2023).
- Kentaro Yoshioka. 2023. An 818-TOPS/W CSNR-31dB SQNR-45dB 10-bit Capacitor-Reconfiguring Computing-in-Memory Macro with Software-Analog Co-Design for Transformers. arXiv preprint arXiv:2302.06463 (2023).
- Bo Zhang, et al. 2023. PIMCA: A Programmable In-Memory Computing Accelerator for Energy-Efficient DNN Inference. JSSC 58, 5 (2023), 1436–1449.
- Yang Zhao, et al. 2020. SmartExchange: Trading Higher-cost Memory Storage/Access for Lower-cost Computation. ISCA (2020), 954–967.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.