Papers
Topics
Authors
Recent
Search
2000 character limit reached

OmniBoost: Boosting Throughput of Heterogeneous Embedded Devices under Multi-DNN Workload

Published 6 Jul 2023 in cs.LG and cs.PF | (2307.03290v1)

Abstract: Modern Deep Neural Networks (DNNs) exhibit profound efficiency and accuracy properties. This has introduced application workloads that comprise of multiple DNN applications, raising new challenges regarding workload distribution. Equipped with a diverse set of accelerators, newer embedded system present architectural heterogeneity, which current run-time controllers are unable to fully utilize. To enable high throughput in multi-DNN workloads, such a controller is ought to explore hundreds of thousands of possible solutions to exploit the underlying heterogeneity. In this paper, we propose OmniBoost, a lightweight and extensible multi-DNN manager for heterogeneous embedded devices. We leverage stochastic space exploration and we combine it with a highly accurate performance estimator to observe a x4.6 average throughput boost compared to other state-of-the-art methods. The evaluation was performed on the HiKey970 development board.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (30)
  1. H. Kwon et al., “Heterogeneous dataflow accelerators for multi-dnn workloads,” in 2021 IEEE International Symposium on High-Performance Computer Architecture (HPCA).   IEEE, 2021, pp. 71–83.
  2. D. Kang et al., “Scheduling of deep learning applications onto heterogeneous processors in an embedded device,” IEEE Access, 2020.
  3. C. Hsieh et al., “Surf: Self-aware unified runtime framework for parallel programs on heterogeneous mobile architectures,” in 2019 IFIP/IEEE 27th International Conference on Very Large Scale Integration (VLSI-SoC).   IEEE, 2019.
  4. C.-J. Wu et al., “Machine learning at facebook: Understanding inference at the edge,” in 2019 IEEE international symposium on high performance computer architecture (HPCA).   IEEE, 2019, pp. 331–344.
  5. B. Cox, et al., “Masa: Responsive multi-dnn inference on the edge,” in 2021 IEEE International Conference on Pervasive Computing and Communications (PerCom).   IEEE, 2021, pp. 1–10.
  6. C.-Y. Hsieh et al., “The case for exploiting underutilized resources in heterogeneous mobile architectures,” in 2019 Design, Automation & Test in Europe Conference & Exhibition (DATE).   IEEE, 2019.
  7. S. Wang et al., “High-throughput cnn inference on embedded arm big. little multicore processors,” IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 2019.
  8. D. Nichols et al., “A survey and empirical evaluation of parallel deep learning frameworks,” arXiv e-prints, pp. arXiv–2111, 2021.
  9. A. Krizhevsky et al., “Imagenet classification with deep convolutional neural networks,” Communications of the ACM, 2017.
  10. A. G. Howard et al., “Mobilenets: Efficient convolutional neural networks for mobile vision applications,” arXiv preprint arXiv:1704.04861, 2017.
  11. K. Simonyan et al., “Very deep convolutional networks for large-scale image recognition,” arXiv preprint arXiv:1409.1556, 2014.
  12. F. N. Iandola et al., “Squeezenet: Alexnet-level accuracy with 50x fewer parameters and¡ 0.5 mb model size,” arXiv preprint arXiv:1602.07360, 2016.
  13. S. S. Latifi Oskouei et al., “Cnndroid: Gpu-accelerated execution of trained deep convolutional neural networks on android,” in Proceedings of the 24th ACM international conference on Multimedia, 2016, pp. 1201–1205.
  14. M. Alzantot et al., “Rstensorflow: Gpu enabled tensorflow for deep learning on commodity android devices,” in Proceedings of the 1st International Workshop on Deep Learning for Mobile Systems and Applications, 2017, pp. 7–12.
  15. Google. Renderscript. [Online]. Available: https://developer.android.com/guide/topics/renderscript/compute
  16. E. Baek et al., “A multi-neural network acceleration architecture,” in 2020 ACM/IEEE 47th Annual International Symposium on Computer Architecture (ISCA).   IEEE, 2020, pp. 940–953.
  17. O. Spantidi et al., “Targeting dnn inference via efficient utilization of heterogeneous precision dnn accelerators,” IEEE Transactions on Emerging Topics in Computing, 2022.
  18. A. Das, “Real-time scheduling of machine learning operations on heterogeneous neuromorphic soc,” arXiv preprint arXiv:2209.14777, 2022.
  19. M. Han et al., “Mosaic: Heterogeneity-, communication-, and constraint-aware model slicing and execution for accurate and efficient inference,” in 2019 28th International Conference on Parallel Architectures and Compilation Techniques (PACT).   IEEE, 2019.
  20. T. Mikolov et al., “Distributed representations of words and phrases and their compositionality,” Advances in neural information processing systems, vol. 26, 2013.
  21. K. He et al., “Deep residual learning for image recognition,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016.
  22. D. Hendrycks and K. Gimpel, “Gaussian error linear units (gelus),” arXiv preprint arXiv:1606.08415, 2016.
  23. K. Fukushima, “Cognitron: A self-organizing multilayered neural network,” Biological cybernetics, 1975.
  24. M. Świechowski et al., “Monte carlo tree search: A review of recent modifications and applications,” Artificial Intelligence Review, 2022.
  25. D. Silver et al., “Mastering the game of go with deep neural networks and tree search,” nature, 2016.
  26. A. Paszke et al., “Pytorch: An imperative style, high-performance deep learning library,” Advances in neural information processing systems, vol. 32, 2019.
  27. G. Brockman et al., “Openai gym,” arXiv preprint arXiv:1606.01540, 2016.
  28. ARM. (2017) Arm compute library. [Online]. [Online]. Available: https://www.arm.com/technologies/compute-library
  29. C. Szegedy et al., “Rethinking the inception architecture for computer vision,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016.
  30. ——, “Inception-v4, inception-resnet and the impact of residual connections on learning,” in Thirty-first AAAI conference on artificial intelligence, 2017.
Citations (8)

Summary

No one has generated a summary of this paper yet.

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.