Low-Latency ML Inference by Grouping Correlated Data Objects and Computation
Abstract: ML inference workflows often require low latency and high throughput, yet we lack good options for addressing this need. Techniques that reduce latency in other streaming settings (such as caching and optimization-driven scheduling) are of limited value because ML data dependencies are often very large and can change dramatically depending on the triggering event. In this work, we propose a novel correlation grouping mechanism that makes it easier for developers to express application-specific data access correlations, enabling coordinated management of data objects in server clusters hosting streaming inference tasks. Experiments based on a latency-sensitive ML-based application confirm the limitations of standard techniques while showing that our solution yields dramatically better performance. The proposed mechanism is able to maintain significantly lower and more consistent latency, achieves higher node utilization as workload and scale-out increase, and yet requires only minor changes to the code implementing the application.
- Machine Learning at Microsoft with ML.NET. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (Anchorage, AK, USA) (KDD ’19). Association for Computing Machinery, New York, NY, USA, 2448–2458. https://doi.org/10.1145/3292500.3330667
- Apollo: Scalable and Coordinated Scheduling for Cloud-Scale Computing. In 11th USENIX Symposium on Operating Systems Design and Implementation (OSDI 14). USENIX Association, Broomfield, CO, 285–300. https://www.usenix.org/conference/osdi14/technical-sessions/presentation/boutin
- Mikel Broström. 2022. Real-time multi-camera multi-object tracker using YOLOv5 and StrongSORT with OSNet. https://github.com/mikel-brostrom/Yolov5_StrongSORT_OSNet
- Windows azure storage: a highly available cloud storage service with strong consistency. In Proceedings of the Twenty-Third ACM Symposium on Operating Systems Principles. 143–157.
- Optimizing In-Memory Database Engine for AI-Powered on-Line Decision Augmentation Using Persistent Memory. Proc. VLDB Endow. 14, 5 (jan 2021), 799–812. https://doi.org/10.14778/3446095.3446102
- InferLine: Latency-Aware Provisioning and Scaling for Prediction Serving Pipelines. In Proceedings of the 11th ACM Symposium on Cloud Computing (Virtual Event, USA) (SoCC ’20). Association for Computing Machinery, New York, NY, USA, 477–491. https://doi.org/10.1145/3419111.3421285
- Clipper: A Low-Latency Online Prediction Serving System. In 14th USENIX Symposium on Networked Systems Design and Implementation (NSDI 17). USENIX Association, Boston, MA, 613–627. https://www.usenix.org/conference/nsdi17/technical-sessions/presentation/crankshaw
- Edge Intelligence: The Confluence of Edge Computing and Artificial Intelligence. IEEE Internet of Things Journal 7, 8 (2020), 7457–7469. https://doi.org/10.1109/JIOT.2020.2984887
- Derecho Project. 2023. Cascade. https://github.com/Derecho-Project/cascade
- Strongsort: Make Deepsort Great Again. arXiv preprint arXiv:2202.13514 (2022).
- IEEE-1588™ Standard for a Precision Clock Synchronization Protocol for Networked Measurement and Control Systems. In Proceedings of the 34th Annual Precise Time and Time Interval Systems and Applications Meeting. 243–254.
- Harshayu Girase. 2022. Human Path Prediction. https://github.com/HarshayuGirase/Human-Path-Prediction
- A Survey of Distributed Data Stream Processing Frameworks. IEEE Access 7 (2019), 154300–154316. https://doi.org/10.1109/ACCESS.2019.2946884
- Benjamin Kettner and Frank Geisler. 2022. IoT Hub, Event Hub, and Streaming Data. In Pro Serverless Data Handling with Microsoft Azure: Architecting ETL and Data-Driven Applications in the Cloud. Springer, 153–168.
- Faastlane: Accelerating Function-as-a-Service Workflows. In 2021 USENIX Annual Technical Conference (USENIX ATC 21). USENIX Association, 805–820. https://www.usenix.org/conference/atc21/presentation/kotni
- Willump: A Statistically-Aware End-to-end Optimizer for Machine Learning Inference. In Proceedings of Machine Learning and Systems, I. Dhillon, D. Papailiopoulos, and V. Sze (Eds.), Vol. 2. 147–159. https://proceedings.mlsys.org/paper_files/paper/2020/file/d9e5bd751997cffa6bc2d0e31ebdc048-Paper.pdf
- PRETZEL: Opening the Black Box of Machine Learning Prediction Serving Systems. In 13th USENIX Symposium on Operating Systems Design and Implementation (OSDI 18). USENIX Association, Carlsbad, CA, 611–626. https://www.usenix.org/conference/osdi18/presentation/lee
- ORION and the Three Rights: Sizing, Bundling, and Prewarming for Serverless DAGs. In 16th USENIX Symposium on Operating Systems Design and Implementation (OSDI 22). USENIX Association, Carlsbad, CA, 303–320. https://www.usenix.org/conference/osdi22/presentation/mahgoub
- Nantia Makrynioti and Vasilis Vassalos. 2021. Declarative Data Analytics: A Survey. IEEE Transactions on Knowledge and Data Engineering 33, 6 (2021), 2392–2411. https://doi.org/10.1109/TKDE.2019.2958084
- From Goals, Waypoints & Paths to Long Term Human Trajectory Forecasting. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV). 15233–15242.
- Microsoft. 2022a. Integrate Azure Stream Analytics with Azure Machine Learning. https://learn.microsoft.com/en-us/azure/stream-analytics/machine-learning-udf
- Microsoft. 2022b. Leverage query parallelization in Azure Stream Analytics. https://learn.microsoft.com/en-us/azure/stream-analytics/stream-analytics-parallelization
- Ray: A Distributed Framework for Emerging AI Applications. In 13th USENIX Symposium on Operating Systems Design and Implementation (OSDI 18). USENIX Association, Carlsbad, CA, 561–577. https://www.usenix.org/conference/osdi18/presentation/moritz
- Real-Time Machine Learning: The Missing Pieces. In Proceedings of the 16th Workshop on Hot Topics in Operating Systems (Whistler, BC, Canada) (HotOS ’17). Association for Computing Machinery, New York, NY, USA, 106–110. https://doi.org/10.1145/3102980.3102998
- Pushing ML Predictions Into DBMSs. IEEE Transactions on Knowledge and Data Engineering 35, 10 (2023), 10295–10308. https://doi.org/10.1109/TKDE.2023.3269592
- End-to-End Optimization of Machine Learning Prediction Queries. In Proceedings of the 2022 International Conference on Management of Data (Philadelphia, PA, USA) (SIGMOD ’22). Association for Computing Machinery, New York, NY, USA, 587–601. https://doi.org/10.1145/3514221.3526141
- Rob Reagan. 2018. Cosmos DB. Apress, Berkeley, CA, 187–255. https://doi.org/10.1007/978-1-4842-2976-7_6
- Redis. 2023. Scaling with Redis Cluster. https://redis.io/docs/management/scaling/
- Learning Social Etiquette: Human Trajectory Understanding In Crowded Scenes. In Computer Vision – ECCV 2016, Bastian Leibe, Jiri Matas, Nicu Sebe, and Max Welling (Eds.). Springer International Publishing, Cham, 549–565.
- Declarative Data Serving: The Future of Machine Learning Inference on the Edge. Proc. VLDB Endow. 14, 11 (jul 2021), 2555–2562. https://doi.org/10.14778/3476249.3476302
- Cascade: An Edge Computing Platform for Real-Time Machine Intelligence. In Proceedings of the 2022 Workshop on Advanced Tools, Programming Languages, and PLatforms for Implementing and Evaluating Algorithms for Distributed Systems (Salerno, Italy) (ApPLIED ’22). Association for Computing Machinery, New York, NY, USA, 2–6. https://doi.org/10.1145/3524053.3542741
- Cloudburst: Stateful Functions-as-a-Service. Proc. VLDB Endow. 13, 12 (jul 2020), 2438–2452. https://doi.org/10.14778/3407790.3407836
- Apache Storm. 2023. Concepts. https://storm.apache.org/releases/current/Concepts.html
- Hyperscan: A Fast Multi-pattern Regex Matcher for Modern CPUs. In 16th USENIX Symposium on Networked Systems Design and Implementation (NSDI 19). USENIX Association, Boston, MA, 631–648. https://www.usenix.org/conference/nsdi19/presentation/wang-xiang
- Chukonu: A Fully-Featured High-Performance Big Data Framework That Integrates a Native Compute Engine into Spark. Proc. VLDB Endow. 15, 4 (dec 2021), 872–885. https://doi.org/10.14778/3503585.3503596
- Following the Data, Not the Function: Rethinking Function Orchestration in Serverless Computing. In 20th USENIX Symposium on Networked Systems Design and Implementation (NSDI 23). USENIX Association, Boston, MA, 1489–1504. https://www.usenix.org/conference/nsdi23/presentation/yu
- Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing. In 9th USENIX Symposium on Networked Systems Design and Implementation (NSDI 12). USENIX Association, San Jose, CA, 15–28. https://www.usenix.org/conference/nsdi12/technical-sessions/presentation/zaharia
- Live Video Analytics at Scale with Approximation and Delay-Tolerance. In 14th USENIX Symposium on Networked Systems Design and Implementation (NSDI 17). USENIX Association, Boston, MA, 377–392. https://www.usenix.org/conference/nsdi17/technical-sessions/presentation/zhang
- Learning Generalisable Omni-Scale Representations for Person Re-Identification. IEEE Transactions on Pattern Analysis and Machine Intelligence 44, 9 (2022), 5056–5069. https://doi.org/10.1109/TPAMI.2021.3069237
- Database Meets Artificial Intelligence: A Survey. IEEE Transactions on Knowledge and Data Engineering 34, 3 (2022), 1096–1116. https://doi.org/10.1109/TKDE.2020.2994641
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.