Papers
Topics
Authors
Recent
Search
2000 character limit reached

Real-time Neural Network Inference on Extremely Weak Devices: Agile Offloading with Explainable AI

Published 21 Dec 2023 in cs.LG and cs.AI | (2312.14229v1)

Abstract: With the wide adoption of AI applications, there is a pressing need of enabling real-time neural network (NN) inference on small embedded devices, but deploying NNs and achieving high performance of NN inference on these small devices is challenging due to their extremely weak capabilities. Although NN partitioning and offloading can contribute to such deployment, they are incapable of minimizing the local costs at embedded devices. Instead, we suggest to address this challenge via agile NN offloading, which migrates the required computations in NN offloading from online inference to offline learning. In this paper, we present AgileNN, a new NN offloading technique that achieves real-time NN inference on weak embedded devices by leveraging eXplainable AI techniques, so as to explicitly enforce feature sparsity during the training phase and minimize the online computation and communication costs. Experiment results show that AgileNN's inference latency is >6x lower than the existing schemes, ensuring that sensory data on embedded devices can be timely consumed. It also reduces the local device's resource consumption by >8x, without impairing the inference accuracy.

Authors (2)
Definition Search Book Streamline Icon: https://streamlinehq.com
References (70)
  1. [n. d.]. Progressive Automations. https://www.progressiveautomations.com/pages/industrial-linear-actuators.
  2. [n. d.]. STM32F7/H7 Series Manual. https://www.st.com/resource/en/programming_manual/pm0253-stm32f7-series-and-stm32h7-series-cortexm7-processor-programming-manual-stmicroelectronics.pdf.
  3. Self-reconfigurable micro-implants for cross-tissue wireless and batteryless connectivity. In MobiCom’20: Proceedings of the 26th Annual International Conference on Mobile Computing and Networking.
  4. Soft-to-hard vector quantization for end-to-end learning compressible representations. Advances in neural information processing systems 30 (2017).
  5. An improved deep learning architecture for person re-identification. In Proceedings of the IEEE conference on computer vision and pattern recognition. 3908–3916.
  6. Deep speech 2: End-to-end speech recognition in english and mandarin. In International conference on machine learning. PMLR, 173–182.
  7. Deep learning based wireless localization for indoor navigation. In Proceedings of the 26th Annual International Conference on Mobile Computing and Networking. 1–14.
  8. Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473 (2014).
  9. Anton Bakker and Johan H Huijsing. 1999. A low-cost high-accuracy CMOS smart temperature sensor. In Proceedings of the 25th European Solid-State Circuits Conference. IEEE, 302–305.
  10. Micronets: Neural network architectures for deploying tinyml applications on commodity microcontrollers. Proceedings of Machine Learning and Systems 3 (2021).
  11. Léon Bottou. 2012. Stochastic gradient descent tricks. In Neural networks: Tricks of the trade. Springer, 421–436.
  12. Leo Breiman. 2001. Random forests. Machine learning 45, 1 (2001), 5–32.
  13. ebp: A wearable system for frequent and comfortable blood pressure monitoring from user’s ear. In The 25th annual international conference on mobile computing and networking. 1–17.
  14. Gradnorm: Gradient normalization for adaptive loss balancing in deep multitask networks. In International Conference on Machine Learning. PMLR, 794–803.
  15. Deep learning in video multi-object tracking: A survey. Neurocomputing 381 (2020), 61–88.
  16. Advancing mathematics by guiding human intuition with AI. Nature 600, 7887 (2021), 70–74.
  17. Imagenet: A large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition. Ieee, 248–255.
  18. Exploiting linear structure within convolutional networks for efficient evaluation. In Advances in neural information processing systems. 1269–1277.
  19. Optimal sampling frequency and bias error modeling for foot-mounted IMUs. In International Conference on Indoor Positioning and Indoor Navigation. IEEE, 1–9.
  20. JointDNN: An efficient training and inference engine for intelligent mobile cloud computing services. IEEE Transactions on Mobile Computing 20, 2 (2019), 565–576.
  21. Nestdnn: Resource-aware multi-tenant on-device deep learning for continuous mobile vision. In Proceedings of the 24th Annual International Conference on Mobile Computing and Networking. 115–127.
  22. Pruning neural networks at initialization: Why are we missing the mark? arXiv preprint arXiv:2009.08576 (2020).
  23. Intention-net: Integrating planning and deep learning for goal-directed autonomous navigation. In Conference on robot learning. PMLR, 185–194.
  24. Intelligence beyond the edge: Inference on intermittent embedded systems. In Proceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems. 199–213.
  25. Compressing deep convolutional networks using vector quantization. arXiv preprint arXiv:1412.6115 (2014).
  26. Multi-loss weighting with coefficient of variations. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision. 1469–1478.
  27. Learning both weights and connections for efficient neural networks. arXiv preprint arXiv:1506.02626 (2015).
  28. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition. 770–778.
  29. Fast axiomatic attribution for neural networks. Advances in Neural Information Processing Systems 34 (2021), 19513–19524.
  30. Couper: Dnn model slicing for visual analytics containers at the edge. In Proceedings of the 4th ACM/IEEE Symposium on Edge Computing. 179–194.
  31. Dynamic adaptive DNN surgery for inference acceleration on the edge. In IEEE INFOCOM 2019-IEEE Conference on Computer Communications. IEEE, 1423–1431.
  32. When face recognition meets with deep learning: an evaluation of convolutional neural networks for face recognition. In Proceedings of the IEEE international conference on computer vision workshops. 142–150.
  33. Inter-technology backscatter: Towards internet connectivity for implanted devices. In Proceedings of the 2016 ACM SIGCOMM Conference. 356–369.
  34. Neurosurgeon: Collaborative intelligence between the cloud and mobile edge. ACM SIGARCH Computer Architecture News 45, 1 (2017), 615–629.
  35. Wi-Fi backscatter: Internet connectivity for RF-powered devices. In Proceedings of the 2014 ACM Conference on SIGCOMM. 607–618.
  36. Edge-host partitioning of deep neural networks with feature space encoding for resource-constrained internet-of-things platforms. In 2018 15th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS). IEEE, 1–6.
  37. Federated learning: Strategies for improving communication efficiency. arXiv preprint arXiv:1610.05492 (2016).
  38. Learning multiple layers of features from tiny images. (2009).
  39. SPINN: synergistic progressive inference of neural networks over device and cloud. In Proceedings of the 26th Annual International Conference on Mobile Computing and Networking. 1–15.
  40. Ya Le and Xuan Yang. 2015. Tiny imagenet visual recognition challenge. CS 231N 7, 7 (2015), 3.
  41. Didier Le Gall. 1991. MPEG: A video compression standard for multimedia applications. Commun. ACM 34, 4 (1991), 46–58.
  42. Jalad: Joint accuracy-and latency-aware deep structure decoupling for edge-cloud execution. In 2018 IEEE 24th international conference on parallel and distributed systems (ICPADS). IEEE, 671–678.
  43. Yiran Li and Tong Zhang. 2011. Reducing dram image data access energy consumption in video processing. IEEE Transactions on Multimedia 14, 2 (2011), 303–313.
  44. Mcunet: Tiny deep learning on iot devices. arXiv preprint arXiv:2007.10319 (2020).
  45. DeepN-JPEG: A deep neural network favorable JPEG-based image compression framework. In Proceedings of the 55th annual design automation conference. 1–6.
  46. Machine vision guided 3d medical image compression for efficient transmission and accurate segmentation in the clouds. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 12687–12696.
  47. Enabling deep-tissue networking for miniature medical devices. In Proceedings of the 2018 Conference of the ACM Special Interest Group on Data Communication. 417–431.
  48. Vinod Nair and Geoffrey E Hinton. 2010. Rectified linear units improve restricted boltzmann machines. In Proceedings of the 27th International Conference on Machine Learning (ICML).
  49. Mark R Nelson. 1989. LZW data compression. Dr. Dobb’s Journal 14, 10 (1989), 29–36.
  50. Reading digits in natural images with unsupervised feature learning. (2011).
  51. Patdnn: Achieving real-time dnn execution on mobile devices with pattern-based weight pruning. In Proceedings of the Twenty-Fifth International Conference on Architectural Support for Programming Languages and Operating Systems. 907–922.
  52. 26ms inference time for resnet-50: Towards real-time execution of all dnns on smartphone. arXiv preprint arXiv:1905.00571 (2019).
  53. ” Why should i trust you?” Explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining. 1135–1144.
  54. Model-agnostic interpretability of machine learning. arXiv preprint arXiv:1606.05386 (2016).
  55. Mobilenetv2: Inverted residuals and linear bottlenecks. In Proceedings of the IEEE conference on computer vision and pattern recognition. 4510–4520.
  56. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision. 618–626.
  57. Learning Important Features Through Propagating Activation Differences. CoRR abs/1704.02685 (2017).
  58. Karen Simonyan and Andrew Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014).
  59. Axiomatic attribution for deep networks. In International Conference on Machine Learning. PMLR, 3319–3328.
  60. Mingxing Tan and Quoc Le. 2021. Efficientnetv2: Smaller models and faster training. In International Conference on Machine Learning. PMLR, 10096–10106.
  61. Attention is all you need. Advances in neural information processing systems 30 (2017).
  62. Gregory K Wallace. 1992. The JPEG still picture compression standard. IEEE transactions on consumer electronics 38, 1 (1992), xviii–xxxiv.
  63. From IoT to 5G I-IoT: The next generation IoT-based intelligent algorithms and 5G technologies. IEEE Communications Magazine 56, 10 (2018), 114–120.
  64. Modeling spatial-temporal clues in a hybrid deep learning framework for video classification. In Proceedings of the 23rd ACM international conference on Multimedia. 461–470.
  65. Deep compressive offloading: Speeding up neural network inference by trading edge computation for network latency. In Proceedings of the 18th Conference on Embedded Networked Sensor Systems. 476–488.
  66. Deep modular co-attention networks for visual question answering. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 6281–6290.
  67. Elf: accelerate high-resolution mobile deep vision with content-aware parallel offloading. In Proceedings of the 27th Annual International Conference on Mobile Computing and Networking. 201–214.
  68. Exploring self-attention for image recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 10076–10085.
  69. Deep learning for COVID-19 detection based on CT images. Scientific Reports 11, 1 (2021), 1–12.
  70. Yanbo Zhao and Zhaohui Ye. 2008. A low cost GSM/GPRS based wireless home security system. IEEE Transactions on Consumer Electronics 54, 2 (2008), 567–572.
Citations (23)

Summary

No one has generated a summary of this paper yet.

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.