Effective one-shot KV context compaction
Determine efficient single-pass key–value cache compaction procedures for Transformer-based autoregressive language models that reduce the size of the KV cache while preserving downstream model behavior when the compacted prefix is later concatenated with uncompacted and future tokens.
References
Effective context compaction—reducing KV cache size in a single pass while preserving downstream model behavior—remains an important open problem.
— Fast KV Compaction via Attention Matching
(2602.16284 - Zweiger et al., 18 Feb 2026) in Section 1 (Introduction)