Papers
Topics
Authors
Recent
Search
2000 character limit reached

Predictive Performance Modeling for Distributed Computing using Black-Box Monitoring and Machine Learning

Published 30 May 2018 in cs.DC and cs.LG | (1805.11877v1)

Abstract: In many domains, the previous decade was characterized by increasing data volumes and growing complexity of computational workloads, creating new demands for highly data-parallel computing in distributed systems. Effective operation of these systems is challenging when facing uncertainties about the performance of jobs and tasks under varying resource configurations, e.g., for scheduling and resource allocation. We survey predictive performance modeling (PPM) approaches to estimate performance metrics such as execution duration, required memory or wait times of future jobs and tasks based on past performance observations. We focus on non-intrusive methods, i.e., methods that can be applied to any workload without modification, since the workload is usually a black-box from the perspective of the systems managing the computational infrastructure. We classify and compare sources of performance variation, predicted performance metrics, required training data, use cases, and the underlying prediction techniques. We conclude by identifying several open problems and pressing research needs in the field.

Citations (53)

Summary

No one has generated a summary of this paper yet.

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.