Papers
Topics
Authors
Recent
Search
2000 character limit reached

Automatic BLAS Offloading on Unified Memory Architecture: A Study on NVIDIA Grace-Hopper

Published 19 Apr 2024 in cs.DC | (2404.13195v5)

Abstract: Porting codes to GPU often requires major efforts. While several tools exist for automatically offload numerical libraries such as BLAS and LAPACK, they often prove impractical due to the high cost of mandatory data transfer. The new unified memory architecture in NVIDIA Grace-Hopper allows high bandwidth cache-coherent memory access of all memory from both CPU and GPU, potentially eliminating bottleneck faced in conventional architecture. This breakthrough opens up new avenues for application development and porting strategies. In this study, we introduce a new tool for automatic BLAS offload, the tool leverages the high speed cache coherent NVLink C2C interconnect in Grace-Hopper, and enables performant GPU offload for BLAS heavy applications with no code changes or recompilation. The tool was tested on two quantum chemistry or physics codes, great performance benefits were observed.

Citations (3)

Summary

No one has generated a summary of this paper yet.

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 6 tweets with 11 likes about this paper.