MLKAPS: Machine Learning and Adaptive Sampling for HPC Kernel Auto-tuning

Published 10 Jan 2025 in cs.PF and cs.SE | (2501.05811v1)

Abstract: Many High-Performance Computing (HPC) libraries rely on decision trees to select the best kernel hyperparameters at runtime,depending on the input and environment. However, finding optimized configurations for each input and environment is challengingand requires significant manual effort and computational resources. This paper presents MLKAPS, a tool that automates this task usingmachine learning and adaptive sampling techniques. MLKAPS generates decision trees that tune HPC kernels' design parameters toachieve efficient performance for any user input. MLKAPS scales to large input and design spaces, outperforming similar state-of-the-artauto-tuning tools in tuning time and mean speedup. We demonstrate the benefits of MLKAPS on the highly optimized Intel MKLdgetrf LU kernel and show that MLKAPS finds blindspots in the manual tuning of HPC experts. It improves over 85% of the inputswith a geomean speedup of x1.30. On the Intel MKL dgeqrf QR kernel, MLKAPS improves performance on 85% of the inputs with ageomean speedup of x1.18.

Abstract PDF Upgrade to Chat

Summary

The paper presents MLKAPS, an auto-tuning framework that leverages machine learning to optimize HPC kernel performance, achieving a geometric mean speedup of 1.30.
It employs a two-phase approach using adaptive sampling and surrogate modeling to efficiently explore large design spaces.
Integration of Genetic Algorithms with decision trees delivers a scalable, embedded solution for runtime HPC kernel tuning.

Overview of MLKAPS: Machine Learning and Adaptive Sampling for HPC Kernel Auto-tuning

The paper presents MLKAPS, an innovative auto-tuning framework designed to optimize High-Performance Computing (HPC) kernels. With increasing complexity and size of design spaces in HPC, manual tuning can be inefficient or even infeasible due to its resource-intensive nature and potential human bias. MLKAPS addresses these challenges by leveraging machine learning and adaptive sampling techniques to automate and enhance the tuning process. The framework aims to improve HPC kernel performance through efficient exploration of large parameter spaces and optimal decision tree generation.

Methodology and Contributions

Framework Architecture: MLKAPS employs a two-phase approach consisting of sampling/modeling and optimization. In the sampling phase, the framework uses adaptive strategies to explore the parameter space, focusing on regions likely to yield optimal performance. The data collected from sampling is used to train a surrogate model, which predicts kernel performance based on design parameters and inputs.
Sampling Techniques: The paper introduces three distinct methods for sampling: space-filling, Hierarchical Variance Sampling (HVS), and GA-Adaptive. These techniques vary in their approach, from uniform space coverage and exploration of high-variance regions to optima-biased sampling in GA-Adaptive, which emphasizes known areas of interest for configurations.
Optimization and Decision Trees: Trained models guide the subsequent optimization phase, where Genetic Algorithms (GAs) are utilized to find optimal configurations on a discretized grid of input parameters. These results are distilled into decision trees, providing a lightweight, embedded solution for runtime tuning.
Benchmarked Results: MLKAPS demonstrates significant performance improvement over traditional methods on benchmark kernels, such as Intel MKL’s dgetrf and dgeqrf. For instance, MLKAPS achieved a geometric mean speedup of 1.30 over the reference configurations, improving performance on 85% of inputs for specific kernels.
Comparison with State-of-the-Art: When compared to other frameworks like GPTune and Optuna, MLKAPS showed superior scalability and efficiency in handling a large number of samples and tasks. Its decoupled approach to sampling and optimization was particularly advantageous for large design spaces.

Implications and Future Work

MLKAPS contributes a robust framework combining machine learning and adaptive strategies, mitigating the curse of dimensionality in HPC kernel tuning. It exemplifies how auto-tuning can effectively bridge the gap between theoretical and practical performance in complex systems.

The implications extend to broader HPC applications, where MLKAPS can be adapted to rapidly evolving hardware landscapes and diverse objectives, including energy consumption and numerical accuracy. Its decision tree output offers a streamlined mechanism for integrating tuning solutions directly into software, minimizing runtime overhead.

Future developments may focus on incorporating constraint handling for more sophisticated optimization problems and further improving decision tree interpretability and efficiency. The continued refinement of model metrics and integration with expert knowledge could enhance both the precision and usability of the framework in industrial settings.

MLKAPS positions itself as a pivotal tool in the ongoing evolution of HPC systems, offering insights and practical solutions that encourage further exploration of machine learning's role in performance optimization.

Markdown Report Issue