Enhancing User Experience in On-Device Machine Learning with Gated Compression Layers
Abstract: On-device machine learning (ODML) enables powerful edge applications, but power consumption remains a key challenge for resource-constrained devices. To address this, developers often face a trade-off between model accuracy and power consumption, employing either computationally intensive models on high-power cores or pared-down models on low-power cores. Both approaches typically lead to a compromise in user experience (UX). This work focuses on the use of Gated Compression (GC) layer to enhance ODML model performance while conserving power and maximizing cost-efficiency, especially for always-on use cases. GC layers dynamically regulate data flow by selectively gating activations of neurons within the neural network and effectively filtering out non-essential inputs, which reduces power needs without compromising accuracy, and enables more efficient execution on heterogeneous compute cores. These improvements enhance UX through prolonged battery life, improved device responsiveness, and greater user comfort. In this work, we have integrated GC layers into vision and speech domain models including the transformer-based ViT model. Our experiments demonstrate theoretical power efficiency gains ranging from 158x to 30,000x for always-on scenarios. This substantial improvement empowers ODML applications with enhanced UX benefits.
- TensorFlow: Large-scale machine learning on heterogeneous systems, 2015. URL http://tensorflow.org/. Software available from tensorflow.org.
- Diannao: A small-footprint high-throughput accelerator for ubiquitous machine-learning. ACM SIGARCH Computer Architecture News, 42(1):269â284, 2014.
- Eyeriss: An energy-efficient reconfigurable accelerator for deep convolutional neural networks. IEEE journal of solid-state circuits, 52(1):127â138, 2016.
- Temporal convolution for real-time keyword spotting on mobile devices. arXiv preprint arXiv:1904.03814, 2019.
- An image is worth 16x16 words: Transformers for image recognition at scale. In International Conference on Learning Representations, 2021. URL https://openreview.net/forum?id=YicbFdNTTy.
- Deep learning with limited numerical precision. In International conference on machine learning, pp. 1737â1746. PMLR, 2015.
- Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding. arXiv preprint arXiv:1510.00149, 2015.
- Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 770â778, 2016.
- Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531, 2015.
- Transformers in vision: A survey. ACM computing surveys (CSUR), 54(10s):1â41, 2022.
- Adam: A method for stochastic optimization. In Bengio, Y. and LeCun, Y. (eds.), 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings, 2015. URL http://arxiv.org/abs/1412.6980.
- Gated compression layers for efficient always-on models, 2023.
- Dynamic switch layers for unsupervised learning. arXiv preprint arXiv:2404.04405, 2024.
- Sgdr: Stochastic gradient descent with warm restarts. arXiv preprint arXiv:1608.03983, 2016.
- Comparing vision transformers and convolutional neural networks for image classification: A literature review. Applied Sciences, 13(9):5521, 2023.
- Machine learning at the network edge: A survey. ACM Computing Surveys (CSUR), 54(8):1â37, 2021.
- ImageNet Large Scale Visual Recognition Challenge. International Journal of Computer Vision (IJCV), 115(3):211â252, 2015. doi: 10.1007/s11263-015-0816-y.
- Streaming keyword spotting on mobile devices. arXiv preprint arXiv:2005.06720, 2020.
- Mnasnet: Platform-aware neural architecture search for mobile. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2820â2828, 2019.
- Attention is all you need. Advances in neural information processing systems, 30, 2017.
- Warden, P. Speech Commands: A Dataset for Limited-Vocabulary Speech Recognition. ArXiv e-prints, April 2018. URL https://arxiv.org/abs/1804.03209.
- Transformers: State-of-the-art natural language processing. In Proceedings of the 2020 conference on empirical methods in natural language processing: system demonstrations, pp. 38â45, 2020.
- TensorFlow Model Garden. https://github.com/tensorflow/models, 2020.
- Incremental network quantization: Towards lossless cnns with low-precision weights. arXiv preprint arXiv:1702.03044, 2017.
- Play it cool: Dynamic shifting prevents thermal throttling. arXiv preprint arXiv:2206.10849, 2022.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.