OmniBoost: Boosting Throughput of Heterogeneous Embedded Devices under Multi-DNN Workload
Abstract: Modern Deep Neural Networks (DNNs) exhibit profound efficiency and accuracy properties. This has introduced application workloads that comprise of multiple DNN applications, raising new challenges regarding workload distribution. Equipped with a diverse set of accelerators, newer embedded system present architectural heterogeneity, which current run-time controllers are unable to fully utilize. To enable high throughput in multi-DNN workloads, such a controller is ought to explore hundreds of thousands of possible solutions to exploit the underlying heterogeneity. In this paper, we propose OmniBoost, a lightweight and extensible multi-DNN manager for heterogeneous embedded devices. We leverage stochastic space exploration and we combine it with a highly accurate performance estimator to observe a x4.6 average throughput boost compared to other state-of-the-art methods. The evaluation was performed on the HiKey970 development board.
- H. Kwon et al., “Heterogeneous dataflow accelerators for multi-dnn workloads,” in 2021 IEEE International Symposium on High-Performance Computer Architecture (HPCA). IEEE, 2021, pp. 71–83.
- D. Kang et al., “Scheduling of deep learning applications onto heterogeneous processors in an embedded device,” IEEE Access, 2020.
- C. Hsieh et al., “Surf: Self-aware unified runtime framework for parallel programs on heterogeneous mobile architectures,” in 2019 IFIP/IEEE 27th International Conference on Very Large Scale Integration (VLSI-SoC). IEEE, 2019.
- C.-J. Wu et al., “Machine learning at facebook: Understanding inference at the edge,” in 2019 IEEE international symposium on high performance computer architecture (HPCA). IEEE, 2019, pp. 331–344.
- B. Cox, et al., “Masa: Responsive multi-dnn inference on the edge,” in 2021 IEEE International Conference on Pervasive Computing and Communications (PerCom). IEEE, 2021, pp. 1–10.
- C.-Y. Hsieh et al., “The case for exploiting underutilized resources in heterogeneous mobile architectures,” in 2019 Design, Automation & Test in Europe Conference & Exhibition (DATE). IEEE, 2019.
- S. Wang et al., “High-throughput cnn inference on embedded arm big. little multicore processors,” IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 2019.
- D. Nichols et al., “A survey and empirical evaluation of parallel deep learning frameworks,” arXiv e-prints, pp. arXiv–2111, 2021.
- A. Krizhevsky et al., “Imagenet classification with deep convolutional neural networks,” Communications of the ACM, 2017.
- A. G. Howard et al., “Mobilenets: Efficient convolutional neural networks for mobile vision applications,” arXiv preprint arXiv:1704.04861, 2017.
- K. Simonyan et al., “Very deep convolutional networks for large-scale image recognition,” arXiv preprint arXiv:1409.1556, 2014.
- F. N. Iandola et al., “Squeezenet: Alexnet-level accuracy with 50x fewer parameters and¡ 0.5 mb model size,” arXiv preprint arXiv:1602.07360, 2016.
- S. S. Latifi Oskouei et al., “Cnndroid: Gpu-accelerated execution of trained deep convolutional neural networks on android,” in Proceedings of the 24th ACM international conference on Multimedia, 2016, pp. 1201–1205.
- M. Alzantot et al., “Rstensorflow: Gpu enabled tensorflow for deep learning on commodity android devices,” in Proceedings of the 1st International Workshop on Deep Learning for Mobile Systems and Applications, 2017, pp. 7–12.
- Google. Renderscript. [Online]. Available: https://developer.android.com/guide/topics/renderscript/compute
- E. Baek et al., “A multi-neural network acceleration architecture,” in 2020 ACM/IEEE 47th Annual International Symposium on Computer Architecture (ISCA). IEEE, 2020, pp. 940–953.
- O. Spantidi et al., “Targeting dnn inference via efficient utilization of heterogeneous precision dnn accelerators,” IEEE Transactions on Emerging Topics in Computing, 2022.
- A. Das, “Real-time scheduling of machine learning operations on heterogeneous neuromorphic soc,” arXiv preprint arXiv:2209.14777, 2022.
- M. Han et al., “Mosaic: Heterogeneity-, communication-, and constraint-aware model slicing and execution for accurate and efficient inference,” in 2019 28th International Conference on Parallel Architectures and Compilation Techniques (PACT). IEEE, 2019.
- T. Mikolov et al., “Distributed representations of words and phrases and their compositionality,” Advances in neural information processing systems, vol. 26, 2013.
- K. He et al., “Deep residual learning for image recognition,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016.
- D. Hendrycks and K. Gimpel, “Gaussian error linear units (gelus),” arXiv preprint arXiv:1606.08415, 2016.
- K. Fukushima, “Cognitron: A self-organizing multilayered neural network,” Biological cybernetics, 1975.
- M. Świechowski et al., “Monte carlo tree search: A review of recent modifications and applications,” Artificial Intelligence Review, 2022.
- D. Silver et al., “Mastering the game of go with deep neural networks and tree search,” nature, 2016.
- A. Paszke et al., “Pytorch: An imperative style, high-performance deep learning library,” Advances in neural information processing systems, vol. 32, 2019.
- G. Brockman et al., “Openai gym,” arXiv preprint arXiv:1606.01540, 2016.
- ARM. (2017) Arm compute library. [Online]. [Online]. Available: https://www.arm.com/technologies/compute-library
- C. Szegedy et al., “Rethinking the inception architecture for computer vision,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016.
- ——, “Inception-v4, inception-resnet and the impact of residual connections on learning,” in Thirty-first AAAI conference on artificial intelligence, 2017.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.