Papers
Topics
Authors
Recent
Search
2000 character limit reached

CELLO: Co-designing Schedule and Hybrid Implicit/Explicit Buffer for Complex Tensor Reuse

Published 20 Mar 2023 in cs.DC and cs.AR | (2303.11499v2)

Abstract: Tensor algebra accelerators have been gaining popularity for running high-performance computing (HPC) workloads. Identifying optimal schedules for individual tensor operations and designing hardware to run these schedules is an active area of research. Unfortunately, operators in HPC workloads such as Conjugate Gradient often have operators with skewed shapes, fundamentally limiting the reuse any schedule can leverage. Moreover, the operators form a complex DAG of dependencies, making it challenging to apply simple fusion/pipelining techniques to extract inter-operation reuse. To address these challenges, this work proposes an accelerator CELLO. CELLO uses a novel on-chip buffer mechanism called CHORD co-designed with a novel scheduler called SCORE, which together enables identifying and exploiting reuse over complex DAGs of tensor operations. CELLO provides 4x geomean speedup and 4x energy efficiency over state-of-the-art accelerators across HPC workloads.

Citations (1)

Summary

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.