Closed-form construction of compact keys C_k under Attention Matching

Establish whether a general closed-form construction exists for the compacted key matrix C_k in the Attention Matching framework for KV cache compaction, such that the resulting compact keys, together with fitted scalar biases and values, satisfy the attention-output and attention-mass matching objectives across a specified set of reference queries for arbitrary compaction budgets.

Background

The Attention Matching approach decomposes compaction into matching local attention outputs and attention mass using compacted keys, per-token scalar biases, and values. Given compacted keys, biases and values can be obtained via (nonnegative) least squares in closed form.

However, selecting the compact keys themselves is challenging. The authors state they lack a general closed-form solution for constructing C_k and therefore restrict to subset selection from the original keys (e.g., highest-attention or OMP-based selection), which trades off performance and efficiency. This explicit gap highlights a foundational algorithmic question.

References

We do not have a closed-form solution for constructing \bm{C}_k in general.

Fast KV Compaction via Attention Matching  (2602.16284 - Zweiger et al., 18 Feb 2026) in Section 3.3 (Selecting C_k)