Papers
Topics
Authors
Recent
Search
2000 character limit reached

Low-Latency ML Inference by Grouping Correlated Data Objects and Computation

Published 30 Nov 2023 in cs.DC and cs.AI | (2312.11488v1)

Abstract: ML inference workflows often require low latency and high throughput, yet we lack good options for addressing this need. Techniques that reduce latency in other streaming settings (such as caching and optimization-driven scheduling) are of limited value because ML data dependencies are often very large and can change dramatically depending on the triggering event. In this work, we propose a novel correlation grouping mechanism that makes it easier for developers to express application-specific data access correlations, enabling coordinated management of data objects in server clusters hosting streaming inference tasks. Experiments based on a latency-sensitive ML-based application confirm the limitations of standard techniques while showing that our solution yields dramatically better performance. The proposed mechanism is able to maintain significantly lower and more consistent latency, achieves higher node utilization as workload and scale-out increase, and yet requires only minor changes to the code implementing the application.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (40)
  1. Machine Learning at Microsoft with ML.NET. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (Anchorage, AK, USA) (KDD ’19). Association for Computing Machinery, New York, NY, USA, 2448–2458. https://doi.org/10.1145/3292500.3330667
  2. Apollo: Scalable and Coordinated Scheduling for Cloud-Scale Computing. In 11th USENIX Symposium on Operating Systems Design and Implementation (OSDI 14). USENIX Association, Broomfield, CO, 285–300. https://www.usenix.org/conference/osdi14/technical-sessions/presentation/boutin
  3. Mikel Broström. 2022. Real-time multi-camera multi-object tracker using YOLOv5 and StrongSORT with OSNet. https://github.com/mikel-brostrom/Yolov5_StrongSORT_OSNet
  4. Windows azure storage: a highly available cloud storage service with strong consistency. In Proceedings of the Twenty-Third ACM Symposium on Operating Systems Principles. 143–157.
  5. Optimizing In-Memory Database Engine for AI-Powered on-Line Decision Augmentation Using Persistent Memory. Proc. VLDB Endow. 14, 5 (jan 2021), 799–812. https://doi.org/10.14778/3446095.3446102
  6. InferLine: Latency-Aware Provisioning and Scaling for Prediction Serving Pipelines. In Proceedings of the 11th ACM Symposium on Cloud Computing (Virtual Event, USA) (SoCC ’20). Association for Computing Machinery, New York, NY, USA, 477–491. https://doi.org/10.1145/3419111.3421285
  7. Clipper: A Low-Latency Online Prediction Serving System. In 14th USENIX Symposium on Networked Systems Design and Implementation (NSDI 17). USENIX Association, Boston, MA, 613–627. https://www.usenix.org/conference/nsdi17/technical-sessions/presentation/crankshaw
  8. Edge Intelligence: The Confluence of Edge Computing and Artificial Intelligence. IEEE Internet of Things Journal 7, 8 (2020), 7457–7469. https://doi.org/10.1109/JIOT.2020.2984887
  9. Derecho Project. 2023. Cascade. https://github.com/Derecho-Project/cascade
  10. Strongsort: Make Deepsort Great Again. arXiv preprint arXiv:2202.13514 (2022).
  11. IEEE-1588™ Standard for a Precision Clock Synchronization Protocol for Networked Measurement and Control Systems. In Proceedings of the 34th Annual Precise Time and Time Interval Systems and Applications Meeting. 243–254.
  12. Harshayu Girase. 2022. Human Path Prediction. https://github.com/HarshayuGirase/Human-Path-Prediction
  13. A Survey of Distributed Data Stream Processing Frameworks. IEEE Access 7 (2019), 154300–154316. https://doi.org/10.1109/ACCESS.2019.2946884
  14. Benjamin Kettner and Frank Geisler. 2022. IoT Hub, Event Hub, and Streaming Data. In Pro Serverless Data Handling with Microsoft Azure: Architecting ETL and Data-Driven Applications in the Cloud. Springer, 153–168.
  15. Faastlane: Accelerating Function-as-a-Service Workflows. In 2021 USENIX Annual Technical Conference (USENIX ATC 21). USENIX Association, 805–820. https://www.usenix.org/conference/atc21/presentation/kotni
  16. Willump: A Statistically-Aware End-to-end Optimizer for Machine Learning Inference. In Proceedings of Machine Learning and Systems, I. Dhillon, D. Papailiopoulos, and V. Sze (Eds.), Vol. 2. 147–159. https://proceedings.mlsys.org/paper_files/paper/2020/file/d9e5bd751997cffa6bc2d0e31ebdc048-Paper.pdf
  17. PRETZEL: Opening the Black Box of Machine Learning Prediction Serving Systems. In 13th USENIX Symposium on Operating Systems Design and Implementation (OSDI 18). USENIX Association, Carlsbad, CA, 611–626. https://www.usenix.org/conference/osdi18/presentation/lee
  18. ORION and the Three Rights: Sizing, Bundling, and Prewarming for Serverless DAGs. In 16th USENIX Symposium on Operating Systems Design and Implementation (OSDI 22). USENIX Association, Carlsbad, CA, 303–320. https://www.usenix.org/conference/osdi22/presentation/mahgoub
  19. Nantia Makrynioti and Vasilis Vassalos. 2021. Declarative Data Analytics: A Survey. IEEE Transactions on Knowledge and Data Engineering 33, 6 (2021), 2392–2411. https://doi.org/10.1109/TKDE.2019.2958084
  20. From Goals, Waypoints & Paths to Long Term Human Trajectory Forecasting. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV). 15233–15242.
  21. Microsoft. 2022a. Integrate Azure Stream Analytics with Azure Machine Learning. https://learn.microsoft.com/en-us/azure/stream-analytics/machine-learning-udf
  22. Microsoft. 2022b. Leverage query parallelization in Azure Stream Analytics. https://learn.microsoft.com/en-us/azure/stream-analytics/stream-analytics-parallelization
  23. Ray: A Distributed Framework for Emerging AI Applications. In 13th USENIX Symposium on Operating Systems Design and Implementation (OSDI 18). USENIX Association, Carlsbad, CA, 561–577. https://www.usenix.org/conference/osdi18/presentation/moritz
  24. Real-Time Machine Learning: The Missing Pieces. In Proceedings of the 16th Workshop on Hot Topics in Operating Systems (Whistler, BC, Canada) (HotOS ’17). Association for Computing Machinery, New York, NY, USA, 106–110. https://doi.org/10.1145/3102980.3102998
  25. Pushing ML Predictions Into DBMSs. IEEE Transactions on Knowledge and Data Engineering 35, 10 (2023), 10295–10308. https://doi.org/10.1109/TKDE.2023.3269592
  26. End-to-End Optimization of Machine Learning Prediction Queries. In Proceedings of the 2022 International Conference on Management of Data (Philadelphia, PA, USA) (SIGMOD ’22). Association for Computing Machinery, New York, NY, USA, 587–601. https://doi.org/10.1145/3514221.3526141
  27. Rob Reagan. 2018. Cosmos DB. Apress, Berkeley, CA, 187–255. https://doi.org/10.1007/978-1-4842-2976-7_6
  28. Redis. 2023. Scaling with Redis Cluster. https://redis.io/docs/management/scaling/
  29. Learning Social Etiquette: Human Trajectory Understanding In Crowded Scenes. In Computer Vision – ECCV 2016, Bastian Leibe, Jiri Matas, Nicu Sebe, and Max Welling (Eds.). Springer International Publishing, Cham, 549–565.
  30. Declarative Data Serving: The Future of Machine Learning Inference on the Edge. Proc. VLDB Endow. 14, 11 (jul 2021), 2555–2562. https://doi.org/10.14778/3476249.3476302
  31. Cascade: An Edge Computing Platform for Real-Time Machine Intelligence. In Proceedings of the 2022 Workshop on Advanced Tools, Programming Languages, and PLatforms for Implementing and Evaluating Algorithms for Distributed Systems (Salerno, Italy) (ApPLIED ’22). Association for Computing Machinery, New York, NY, USA, 2–6. https://doi.org/10.1145/3524053.3542741
  32. Cloudburst: Stateful Functions-as-a-Service. Proc. VLDB Endow. 13, 12 (jul 2020), 2438–2452. https://doi.org/10.14778/3407790.3407836
  33. Apache Storm. 2023. Concepts. https://storm.apache.org/releases/current/Concepts.html
  34. Hyperscan: A Fast Multi-pattern Regex Matcher for Modern CPUs. In 16th USENIX Symposium on Networked Systems Design and Implementation (NSDI 19). USENIX Association, Boston, MA, 631–648. https://www.usenix.org/conference/nsdi19/presentation/wang-xiang
  35. Chukonu: A Fully-Featured High-Performance Big Data Framework That Integrates a Native Compute Engine into Spark. Proc. VLDB Endow. 15, 4 (dec 2021), 872–885. https://doi.org/10.14778/3503585.3503596
  36. Following the Data, Not the Function: Rethinking Function Orchestration in Serverless Computing. In 20th USENIX Symposium on Networked Systems Design and Implementation (NSDI 23). USENIX Association, Boston, MA, 1489–1504. https://www.usenix.org/conference/nsdi23/presentation/yu
  37. Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing. In 9th USENIX Symposium on Networked Systems Design and Implementation (NSDI 12). USENIX Association, San Jose, CA, 15–28. https://www.usenix.org/conference/nsdi12/technical-sessions/presentation/zaharia
  38. Live Video Analytics at Scale with Approximation and Delay-Tolerance. In 14th USENIX Symposium on Networked Systems Design and Implementation (NSDI 17). USENIX Association, Boston, MA, 377–392. https://www.usenix.org/conference/nsdi17/technical-sessions/presentation/zhang
  39. Learning Generalisable Omni-Scale Representations for Person Re-Identification. IEEE Transactions on Pattern Analysis and Machine Intelligence 44, 9 (2022), 5056–5069. https://doi.org/10.1109/TPAMI.2021.3069237
  40. Database Meets Artificial Intelligence: A Survey. IEEE Transactions on Knowledge and Data Engineering 34, 3 (2022), 1096–1116. https://doi.org/10.1109/TKDE.2020.2994641

Summary

No one has generated a summary of this paper yet.

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 1 tweet with 0 likes about this paper.