Papers
Topics
Authors
Recent
Search
2000 character limit reached

Stream-K++: Adaptive GPU GEMM Kernel Scheduling and Selection using Bloom Filters

Published 21 Aug 2024 in cs.DC | (2408.11417v1)

Abstract: General matrix multiplication (GEMM) operations are crucial in various computational fields. As GPU architectures evolve, optimizing GEMM performance becomes increasingly important. This paper introduces Stream-K++, an enhancement to the promising Stream-K GEMM scheduling algorithm. We expand Stream-K's scheduling policies from three to seven and implement an efficient solution selection mechanism using Bloom filters. Our approach rapidly eliminates up to 95.8% of unsuitable configurations while maintaining a 100% true-negative rate. Implemented using the AMD Composable Kernel library and evaluated on AMD Instinct MI250X GPUs, Stream-K++ demonstrates significant performance gains (up to 43%) in select scenarios. It remains competitive (within 20% of optimal) for 60-97.6% of problem sizes. Our flexible framework, implemented in the Opensieve C++ library, allows for easy adaptation to new problem sizes, scheduling policies, or additional tuning parameters, paving the way for future optimizations in GPU-based GEMM operations.

Summary

No one has generated a summary of this paper yet.

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 1 tweet with 1 like about this paper.