Bound KV Cache While Preserving Large Effective Context in Autoregressive Video Generation
Develop methods for autoregressive video diffusion models that maintain a large effective attention context window while strictly bounding the per-layer key–value (KV) cache size, so that long-range coherence is preserved without unbounded memory growth during minute-scale video generation.
References
Maintaining a large effective context window while strictly bounding the KV cache size remains a critical open problem.
— PackForcing: Short Video Training Suffices for Long Video Sampling and Long Context Inference
(2603.25730 - Mao et al., 26 Mar 2026) in Section 1 (Introduction)