Learned Importance Predictors for Dynamic Context Selection
Develop learned importance prediction models for PackForcing’s dynamic context selection that more reliably capture visual saliency than attention-based affinity scoring, thereby improving which compressed mid-range blocks are retrieved during generation.
References
Several directions remain open: (i)~the fixed compression ratio ($128\times$ volume / ${\sim}32\times$ token) could be made adaptive to scene complexity; (ii)~attention-based importance scoring may not capture all aspects of visual saliency---learned importance predictors could help; (iii)~scaling to higher resolutions (e.g., $1920{\times}1080$) requires investigating the interaction between spatial compression and quality.
— PackForcing: Short Video Training Suffices for Long Video Sampling and Long Context Inference
(2603.25730 - Mao et al., 26 Mar 2026) in Appendix, Extended Discussion on Limitations