Papers
Topics
Authors
Recent
Search
2000 character limit reached

Enhancing User Experience in On-Device Machine Learning with Gated Compression Layers

Published 2 May 2024 in cs.LG | (2405.01739v1)

Abstract: On-device machine learning (ODML) enables powerful edge applications, but power consumption remains a key challenge for resource-constrained devices. To address this, developers often face a trade-off between model accuracy and power consumption, employing either computationally intensive models on high-power cores or pared-down models on low-power cores. Both approaches typically lead to a compromise in user experience (UX). This work focuses on the use of Gated Compression (GC) layer to enhance ODML model performance while conserving power and maximizing cost-efficiency, especially for always-on use cases. GC layers dynamically regulate data flow by selectively gating activations of neurons within the neural network and effectively filtering out non-essential inputs, which reduces power needs without compromising accuracy, and enables more efficient execution on heterogeneous compute cores. These improvements enhance UX through prolonged battery life, improved device responsiveness, and greater user comfort. In this work, we have integrated GC layers into vision and speech domain models including the transformer-based ViT model. Our experiments demonstrate theoretical power efficiency gains ranging from 158x to 30,000x for always-on scenarios. This substantial improvement empowers ODML applications with enhanced UX benefits.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (25)
  1. TensorFlow: Large-scale machine learning on heterogeneous systems, 2015. URL http://tensorflow.org/. Software available from tensorflow.org.
  2. Diannao: A small-footprint high-throughput accelerator for ubiquitous machine-learning. ACM SIGARCH Computer Architecture News, 42(1):269–284, 2014.
  3. Eyeriss: An energy-efficient reconfigurable accelerator for deep convolutional neural networks. IEEE journal of solid-state circuits, 52(1):127–138, 2016.
  4. Temporal convolution for real-time keyword spotting on mobile devices. arXiv preprint arXiv:1904.03814, 2019.
  5. An image is worth 16x16 words: Transformers for image recognition at scale. In International Conference on Learning Representations, 2021. URL https://openreview.net/forum?id=YicbFdNTTy.
  6. Deep learning with limited numerical precision. In International conference on machine learning, pp. 1737–1746. PMLR, 2015.
  7. Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding. arXiv preprint arXiv:1510.00149, 2015.
  8. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp.  770–778, 2016.
  9. Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531, 2015.
  10. Transformers in vision: A survey. ACM computing surveys (CSUR), 54(10s):1–41, 2022.
  11. Adam: A method for stochastic optimization. In Bengio, Y. and LeCun, Y. (eds.), 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings, 2015. URL http://arxiv.org/abs/1412.6980.
  12. Gated compression layers for efficient always-on models, 2023.
  13. Dynamic switch layers for unsupervised learning. arXiv preprint arXiv:2404.04405, 2024.
  14. Sgdr: Stochastic gradient descent with warm restarts. arXiv preprint arXiv:1608.03983, 2016.
  15. Comparing vision transformers and convolutional neural networks for image classification: A literature review. Applied Sciences, 13(9):5521, 2023.
  16. Machine learning at the network edge: A survey. ACM Computing Surveys (CSUR), 54(8):1–37, 2021.
  17. ImageNet Large Scale Visual Recognition Challenge. International Journal of Computer Vision (IJCV), 115(3):211–252, 2015. doi: 10.1007/s11263-015-0816-y.
  18. Streaming keyword spotting on mobile devices. arXiv preprint arXiv:2005.06720, 2020.
  19. Mnasnet: Platform-aware neural architecture search for mobile. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.  2820–2828, 2019.
  20. Attention is all you need. Advances in neural information processing systems, 30, 2017.
  21. Warden, P. Speech Commands: A Dataset for Limited-Vocabulary Speech Recognition. ArXiv e-prints, April 2018. URL https://arxiv.org/abs/1804.03209.
  22. Transformers: State-of-the-art natural language processing. In Proceedings of the 2020 conference on empirical methods in natural language processing: system demonstrations, pp.  38–45, 2020.
  23. TensorFlow Model Garden. https://github.com/tensorflow/models, 2020.
  24. Incremental network quantization: Towards lossless cnns with low-precision weights. arXiv preprint arXiv:1702.03044, 2017.
  25. Play it cool: Dynamic shifting prevents thermal throttling. arXiv preprint arXiv:2206.10849, 2022.

Summary

No one has generated a summary of this paper yet.

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 1 tweet with 1 like about this paper.