DynaSplit: A Hardware-Software Co-Design Framework for Energy-Aware Inference on Edge
Abstract: The deployment of ML models on edge devices is challenged by limited computational resources and energy availability. While split computing enables the decomposition of large neural networks (NNs) and allows partial computation on both edge and cloud devices, identifying the most suitable split layer and hardware configurations is a non-trivial task. This process is in fact hindered by the large configuration space, the non-linear dependencies between software and hardware parameters, the heterogeneous hardware and energy characteristics, and the dynamic workload conditions. To overcome this challenge, we propose DynaSplit, a two-phase framework that dynamically configures parameters across both software (i.e., split layer) and hardware (e.g., accelerator usage, CPU frequency). During the Offline Phase, we solve a multi-objective optimization problem with a meta-heuristic approach to discover optimal settings. During the Online Phase, a scheduling algorithm identifies the most suitable settings for an incoming inference request and configures the system accordingly. We evaluate DynaSplit using popular pre-trained NNs on a real-world testbed. Experimental results show a reduction in energy consumption up to 72% compared to cloud-only computation, while meeting ~90% of user request's latency threshold compared to baselines.
- Optuna: A Next-generation Hyperparameter Optimization Framework. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.
- The role of the Weibull distribution in modelling traffic in Internet access and backbone core networks. J. Netw. Comput. Appl. 141 (2019), 1–22. https://doi.org/10.1016/J.JNCA.2019.05.002
- Auto-Split: A General Framework of Collaborative Edge-Cloud AI. In KDD ’21: The 27th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, Virtual Event, Singapore, August 14-18, 2021, Feida Zhu, Beng Chin Ooi, and Chunyan Miao (Eds.). ACM, 2543–2553. https://doi.org/10.1145/3447548.3467078
- The Art of Designing Remote IoT Devices - Technologies and Strategies for a Long Battery Life. Sensors 21, 3 (2021), 913. https://doi.org/10.3390/S21030913
- Efficient Multi-User Computation Offloading for Mobile-Edge Cloud Computing. IEEE/ACM Trans. Netw. 24, 5 (2016), 2795–2808. https://doi.org/10.1109/TNET.2015.2487344
- Hyomin Choi and Ivan V. Bajic. 2018. Deep Feature Compression for Collaborative Object Detection. In 2018 IEEE International Conference on Image Processing, ICIP 2018, Athens, Greece, October 7-10, 2018. IEEE, 3743–3747. https://doi.org/10.1109/ICIP.2018.8451100
- Back-And-Forth Prediction for Deep Tensor Compression. In 2020 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2020, Barcelona, Spain, May 4-8, 2020. IEEE, 4467–4471. https://doi.org/10.1109/ICASSP40776.2020.9053011
- A comprehensive survey on model compression and acceleration. Artif. Intell. Rev. 53, 7 (2020), 5113–5155. https://doi.org/10.1007/S10462-020-09816-7
- Lightweight Compression Of Neural Network Feature Tensors For Collaborative Intelligence. In IEEE International Conference on Multimedia and Expo, ICME 2020, London, UK, July 6-10, 2020. IEEE, 1–6. https://doi.org/10.1109/ICME46284.2020.9102797
- Grid5000 Contributors. 2024a. Grid5000. https://www.grid5000.fr/w/Grid5000:Home. Accessed: 2024-10-24.
- Optuna Contributors. 2024b. optuna.samplers.GridSampler. https://optuna.readthedocs.io/en/latest/reference/samplers/generated/optuna.samplers.GridSampler.html. Accessed: 2024-10-24.
- Optuna Contributors. 2024c. optuna.samplers.NSGAIIISampler. https://optuna.readthedocs.io/en/latest/reference/samplers/generated/optuna.samplers.NSGAIIISampler.html. Accessed: 2024-10-24.
- Kalyanmoy Deb and Himanshu Jain. 2014. An Evolutionary Many-Objective Optimization Algorithm Using Reference-Point-Based Nondominated Sorting Approach, Part I: Solving Problems With Box Constraints. IEEE Trans. Evol. Comput. 18, 4 (2014), 577–601. https://doi.org/10.1109/TEVC.2013.2281535
- Model Compression and Hardware Acceleration for Neural Networks: A Comprehensive Survey. Proc. IEEE 108, 4 (2020), 485–532. https://doi.org/10.1109/JPROC.2020.2976475
- Edge Intelligence: The Confluence of Edge Computing and Artificial Intelligence. IEEE Internet Things J. 7, 8 (2020), 7457–7469. https://doi.org/10.1109/JIOT.2020.2984887
- Roadmap for edge AI: a Dagstuhl perspective. Comput. Commun. Rev. 52, 1 (2022), 28–33. https://doi.org/10.1145/3523230.3523235
- REQ-YOLO: A Resource-Aware, Efficient Quantization Framework for Object Detection on FPGAs. In Proceedings of the 2019 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, FPGA 2019, Seaside, CA, USA, February 24-26, 2019, Kia Bazargan and Stephen Neuendorffer (Eds.). ACM, 33–42. https://doi.org/10.1145/3289602.3293904
- An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale. In 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net.
- JointDNN: An Efficient Training and Inference Engine for Intelligent Mobile Cloud Computing Services. IEEE Trans. Mob. Comput. 20, 2 (2021), 565–576. https://doi.org/10.1109/TMC.2019.2947893
- BottleNet: A Deep Learning Architecture for Intelligent Mobile Cloud Computing Services. In 2019 IEEE/ACM International Symposium on Low Power Electronics and Design, ISLPED 2019, Lausanne, Switzerland, July 29-31, 2019. IEEE, 1–6. https://doi.org/10.1109/ISLPED.2019.8824955
- Jonathan E. Fieldsend. 2017. University staff teaching allocation: formulating and optimising a many-objective problem. In Proceedings of the Genetic and Evolutionary Computation Conference, GECCO 2017, Berlin, Germany, July 15-19, 2017, Peter A. N. Bosman (Ed.). ACM, 1097–1104. https://doi.org/10.1145/3071178.3071230
- Evolving polydisperse soft robotic jamming grippers. In GECCO ’22: Genetic and Evolutionary Computation Conference, Companion Volume, Boston, Massachusetts, USA, July 9 - 13, 2022, Jonathan E. Fieldsend and Markus Wagner (Eds.). ACM, 707–710. https://doi.org/10.1145/3520304.3529072
- Lightweight self-organising distributed monitoring of Fog infrastructures. Future Generation Computer Systems 114 (2021), 605–618. https://doi.org/10.1016/j.future.2020.08.011
- Dataflow-Based Joint Quantization for Deep Neural Networks. In Data Compression Conference, DCC 2019, Snowbird, UT, USA, March 26-29, 2019, Ali Bilgin, Michael W. Marcellin, Joan Serra-Sagristà , and James A. Storer (Eds.). IEEE, 574. https://doi.org/10.1109/DCC.2019.00086
- Ltd Good Will Instrument Co. 2024. GPM-8213 Digital Power Meter. https://www.gwinstek.com/en-global/products/detail/GPM-8213. Accessed: 2024-10-24.
- gRPC Authors. 2024. gRPC. https://grpc.io. Accessed: 2024-10-24.
- Design Considerations for Energy-efficient Inference on Edge Devices. In e-Energy ’21: The Twelfth ACM International Conference on Future Energy Systems, Virtual Event, Torino, Italy, 28 June - 2 July, 2021, Herman de Meer and Michela Meo (Eds.). ACM, 302–308. https://doi.org/10.1145/3447555.3465326
- Deep Residual Learning for Image Recognition. In 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016. IEEE Computer Society, 770–778. https://doi.org/10.1109/CVPR.2016.90
- AMC: AutoML for Model Compression and Acceleration on Mobile Devices. In Computer Vision - ECCV 2018 - 15th European Conference, Munich, Germany, September 8-14, 2018, Proceedings, Part VII (Lecture Notes in Computer Science, Vol. 11211), Vittorio Ferrari, Martial Hebert, Cristian Sminchisescu, and Yair Weiss (Eds.). Springer, 815–832. https://doi.org/10.1007/978-3-030-01234-2_48
- Context-aware energy-efficient applications for cyber-physical systems. Ad Hoc Networks 82 (2019), 15–30. https://doi.org/10.1016/J.ADHOC.2018.08.004
- Dynamic Adaptive DNN Surgery for Inference Acceleration on the Edge. In 2019 IEEE Conference on Computer Communications, INFOCOM 2019, Paris, France, April 29 - May 2, 2019. IEEE, 1423–1431. https://doi.org/10.1109/INFOCOM.2019.8737614
- Docker Inc. 2024. tensorflow/tensorflow:2.15.0-gpu. https://hub.docker.com/layers/tensorflow/tensorflow/2.15.0-gpu/images/sha256-66b44c162783bb92ab6f44c1b38bcdfef70af20937089deb7bc20a4f3d7e5491. Accessed: 2024-10-24.
- Himanshu Jain and Kalyanmoy Deb. 2014. An Evolutionary Many-Objective Optimization Algorithm Using Reference-Point Based Nondominated Sorting Approach, Part II: Handling Constraints and Extending to an Adaptive Approach. IEEE Trans. Evol. Comput. 18, 4 (2014), 602–622. https://doi.org/10.1109/TEVC.2013.2281534
- Neurosurgeon: Collaborative Intelligence Between the Cloud and Mobile Edge. In Proceedings of the Twenty-Second International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS 2017, Xi’an, China, April 8-12, 2017, Yunji Chen, Olivier Temam, and John Carter (Eds.). ACM, 615–629. https://doi.org/10.1145/3037697.3037698
- Edge AI: On-Demand Accelerating Deep Neural Network Inference via Edge Computing. IEEE Trans. Wirel. Commun. 19, 1 (2020), 447–457. https://doi.org/10.1109/TWC.2019.2946140
- Auto-tuning Neural Network Quantization Framework for Collaborative Inference Between the Cloud and Edge. In Artificial Neural Networks and Machine Learning - ICANN 2018 - 27th International Conference on Artificial Neural Networks, Rhodes, Greece, October 4-7, 2018, Proceedings, Part I (Lecture Notes in Computer Science, Vol. 11139), Vera Kurková, Yannis Manolopoulos, Barbara Hammer, Lazaros S. Iliadis, and Ilias Maglogiannis (Eds.). Springer, 402–411. https://doi.org/10.1007/978-3-030-01418-6_40
- Tetris: Memory-efficient Serverless Inference through Tensor Sharing. In 2022 USENIX Annual Technical Conference (USENIX ATC 22). USENIX Association, Carlsbad, CA. https://www.usenix.org/conference/atc22/presentation/li-jie
- Kalman Filter-Based Large-Scale Wildfire Monitoring With a System of UAVs. IEEE Trans. Ind. Electron. 66, 1 (2019), 606–615. https://doi.org/10.1109/TIE.2018.2823658
- Google LLC. 2024a. LiteRT overview. https://ai.google.dev/edge/litert. Accessed: 2024-10-24.
- Google LLC. 2024b. USB Accelerator. https://coral.ai/products/accelerator. Accessed: 2024-10-24.
- Raspberry Pi Ltd. 2024. config.txt - Raspberry Pi Documentation. https://www.raspberrypi.com/documentation/computers/config_txt.html#monitoring-core-temperature. Accessed: 2024-10-24.
- Cost-Aware Neural Network Splitting and Dynamic Rescheduling for Edge Intelligence. In Proceedings of the 6th International Workshop on Edge Systems, Analytics and Networking, EdgeSys 2023, Rome, Italy, 8 May 2023, Atakan Aral (Ed.). ACM, 42–47. https://doi.org/10.1145/3578354.3592871
- Increasing Traffic Safety with Real-Time Edge Analytics and 5G. In EdgeSys@EuroSys 2021: 4th International Workshop on Edge Systems, Analytics and Networking, Online Event, United Kingdom, April 26, 2021, Aaron Yi Ding and Richard Mortier (Eds.). ACM, 19–24. https://doi.org/10.1145/3434770.3459732
- Optimisation of crop configuration using NSGA-III with categorical genetic operators. In Proceedings of the Genetic and Evolutionary Computation Conference Companion, GECCO 2019, Prague, Czech Republic, July 13-17, 2019, Manuel López-Ibáñez, Anne Auger, and Thomas Stützle (Eds.). ACM, 223–224. https://doi.org/10.1145/3319619.3321912
- Distilled Split Deep Neural Networks for Edge-Assisted Real-Time Systems. In Proceedings of the 2019 Workshop on Hot Topics in Video Analytics and Intelligent Edges, HotEdgeVideo@MobiCom 2019, Los Cabos, Mexico, October 21-25, 2019, Ganesh Ananthanarayanan, Yunxin Liu, and Yuanchao Shu (Eds.). ACM, 21–26. https://doi.org/10.1145/3349614.3356022
- Head Network Distillation: Splitting Distilled Deep Neural Networks for Resource-Constrained Edge Computing Systems. IEEE Access 8 (2020), 212177–212193. https://doi.org/10.1109/ACCESS.2020.3039714
- BottleFit: Learning Compressed Representations in Deep Neural Networks for Effective and Efficient Split Computing. In 23rd IEEE International Symposium on a World of Wireless, Mobile and Multimedia Networks, WoWMoM 2022, Belfast, United Kingdom, June 14-17, 2022. IEEE, 337–346. https://doi.org/10.1109/WOWMOM54355.2022.00032
- Split Computing and Early Exiting for Deep Learning Applications: Survey and Research Challenges. ACM Comput. Surv. 55, 5 (2023), 90:1–90:30. https://doi.org/10.1145/3527155
- Fausto Morales. 2023. vit-keras. https://github.com/faustomorales/vit-keras/tree/master/vit_keras. Accessed: 2023-10-09.
- Pareto multi objective optimization. In Proceedings of the 13th international conference on, intelligent systems application to power systems. IEEE, 84–91.
- José L. Núñez-Yáñez. 2019. Energy Proportional Neural Network Inference with Adaptive Voltage and Frequency Scaling. IEEE Trans. Computers 68, 5 (2019), 676–687. https://doi.org/10.1109/TC.2018.2879333
- CRIME: Input-Dependent Collaborative Inference for Recurrent Neural Networks. IEEE Trans. Computers 70, 10 (2021), 1626–1639. https://doi.org/10.1109/TC.2020.3021199
- Pocket: ML Serving from the Edge. In Proceedings of the Eighteenth European Conference on Computer Systems (Rome, Italy) (EuroSys ’23). Association for Computing Machinery, New York, NY, USA, 46–62. https://doi.org/10.1145/3552326.3587459
- Efficient Deployment of Transformer Models on Edge TPU Accelerators: A Real System Evaluation. In Architecture and System Support for Transformer Models (ASSYST @ISCA 2023).
- ImageNet Large Scale Visual Recognition Challenge. International Journal of Computer Vision (IJCV) 115, 3 (2015), 211–252. https://doi.org/10.1007/s11263-015-0816-y
- MobileNetV2: Inverted Residuals and Linear Bottlenecks. In 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018. Computer Vision Foundation / IEEE Computer Society, 4510–4520. https://doi.org/10.1109/CVPR.2018.00474
- Deployment of Embedded Edge-AI for Wildlife Monitoring in Remote Regions. In 20th IEEE International Conference on Machine Learning and Applications, ICMLA 2021, Pasadena, CA, USA, December 13-16, 2021, M. Arif Wani, Ishwar K. Sethi, Weisong Shi, Guangzhi Qu, Daniela Stan Raicu, and Ruoming Jin (Eds.). IEEE, 1035–1042. https://doi.org/10.1109/ICMLA52953.2021.00170
- Per-Frame Energy Consumption in 802.11 Devices and Its Implication on Modeling and Design. IEEE/ACM Transactions on Networking 23, 4 (2015), 1243–1256. https://doi.org/10.1109/TNET.2014.2322262
- Karen Simonyan and Andrew Zisserman. 2015. Very Deep Convolutional Networks for Large-Scale Image Recognition. In 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings, Yoshua Bengio and Yann LeCun (Eds.).
- FogBus: A Blockchain-based Lightweight Framework for Edge and Fog Computing. Journal of Systems and Software 154 (2019), 22–36. https://doi.org/10.1016/j.jss.2019.04.050
- An Energy-Aware Approach to Design Self-Adaptive AI-based Applications on the Edge. In 38th IEEE/ACM International Conference on Automated Software Engineering, ASE 2023, Luxembourg, September 11-15, 2023. IEEE, 281–293. https://doi.org/10.1109/ASE56229.2023.00046
- Delay-Sensitive Multi-Period Computation Offloading with Reliability Guarantees in Fog Networks. IEEE Trans. Mob. Comput. 19, 9 (2020), 2062–2075. https://doi.org/10.1109/TMC.2019.2918773
- Edge AI - Convergence of Edge Computing and Artificial Intelligence. Springer. https://doi.org/10.1007/978-981-15-6186-3
- Towards efficient vision transformer inference: a first study of transformers on mobile devices. In HotMobile ’22: The 23rd International Workshop on Mobile Computing Systems and Applications, Tempe, Arizona, USA, March 9 - 10, 2022, Robert LiKamWa and Urs Hengartner (Eds.). ACM, 1–7. https://doi.org/10.1145/3508396.3512869
- EDCompress: Energy-Aware Model Compression for Dataflows. IEEE Trans. Neural Networks Learn. Syst. 35, 1 (2024), 208–220. https://doi.org/10.1109/TNNLS.2022.3172941
- NetAdapt: Platform-Aware Neural Network Adaptation for Mobile Applications. In Computer Vision - ECCV 2018 - 15th European Conference, Munich, Germany, September 8-14, 2018, Proceedings, Part X (Lecture Notes in Computer Science, Vol. 11214), Vittorio Ferrari, Martial Hebert, Cristian Sminchisescu, and Yair Weiss (Eds.). Springer, 289–304. https://doi.org/10.1007/978-3-030-01249-6_18
- Deep compressive offloading: speeding up neural network inference by trading edge computation for network latency. In SenSys ’20: The 18th ACM Conference on Embedded Networked Sensor Systems, Virtual Event, Japan, November 16-19, 2020, Jin Nakazawa and Polly Huang (Eds.). ACM, 476–488. https://doi.org/10.1145/3384419.3430898
- Autodidactic Neurosurgeon: Collaborative Deep Inference for Mobile Edge Intelligence via Online Learning. In WWW ’21: The Web Conference 2021, Virtual Event / Ljubljana, Slovenia, April 19-23, 2021, Jure Leskovec, Marko Grobelnik, Marc Najork, Jie Tang, and Leila Zia (Eds.). ACM / IW3C2, 3111–3123. https://doi.org/10.1145/3442381.3450051
- Towards Real-time Cooperative Deep Inference over the Cloud and Edge End Devices. Proc. ACM Interact. Mob. Wearable Ubiquitous Technol. 4, 2 (2020), 69:1–69:24. https://doi.org/10.1145/3397315
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.