Papers
Topics
Authors
Recent
Search
2000 character limit reached

Solving Attention Kernel Regression Problem via Pre-conditioner

Published 28 Aug 2023 in cs.LG | (2308.14304v2)

Abstract: The attention mechanism is the key to LLMs, and the attention matrix serves as an algorithmic and computational bottleneck for such a scheme. In this paper, we define two problems, motivated by designing fast algorithms for proxy of attention matrix and solving regressions against them. Given an input matrix $A\in \mathbb{R}{n\times d}$ with $n\gg d$ and a response vector $b$, we first consider the matrix exponential of the matrix $A\top A$ as a proxy, and we in turn design algorithms for two types of regression problems: $\min_{x\in \mathbb{R}d}|(A\top A)jx-b|_2$ and $\min_{x\in \mathbb{R}d}|A(A\top A)jx-b|_2$ for any positive integer $j$. Studying algorithms for these regressions is essential, as matrix exponential can be approximated term-by-term via these smaller problems. The second proxy is applying exponential entrywise to the Gram matrix, denoted by $\exp(AA\top)$ and solving the regression $\min_{x\in \mathbb{R}n}|\exp(AA\top)x-b |_2$. We call this problem the attention kernel regression problem, as the matrix $\exp(AA\top)$ could be viewed as a kernel function with respect to $A$. We design fast algorithms for these regression problems, based on sketching and preconditioning. We hope these efforts will provide an alternative perspective of studying efficient approximation of attention matrices.

Citations (8)

Summary

No one has generated a summary of this paper yet.

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Authors (3)

Collections

Sign up for free to add this paper to one or more collections.