Scaling PackForcing to High-Resolution Video and Characterizing Compression–Quality Interactions
Investigate and characterize the interaction between spatial compression and visual quality when scaling PackForcing to higher-resolution video generation (e.g., 1920×1080), and develop compression strategies that preserve quality at such resolutions.
References
Several directions remain open: (i)~the fixed compression ratio ($128\times$ volume / ${\sim}32\times$ token) could be made adaptive to scene complexity; (ii)~attention-based importance scoring may not capture all aspects of visual saliency---learned importance predictors could help; (iii)~scaling to higher resolutions (e.g., $1920{\times}1080$) requires investigating the interaction between spatial compression and quality.
— PackForcing: Short Video Training Suffices for Long Video Sampling and Long Context Inference
(2603.25730 - Mao et al., 26 Mar 2026) in Appendix, Extended Discussion on Limitations