Scaling PackForcing to High-Resolution Video and Characterizing Compression–Quality Interactions

Investigate and characterize the interaction between spatial compression and visual quality when scaling PackForcing to higher-resolution video generation (e.g., 1920×1080), and develop compression strategies that preserve quality at such resolutions.

Background

PackForcing’s compression strategy is demonstrated at 832×480 resolution. Moving to higher resolutions increases spatial token counts and may change how compression impacts fidelity.

The paper explicitly identifies understanding and addressing the compression–quality trade-off at 1080p as an open direction, implying the need for methodology and empirical characterization to maintain quality under higher spatial demands.

References

Several directions remain open: (i)~the fixed compression ratio ($128\times$ volume / ${\sim}32\times$ token) could be made adaptive to scene complexity; (ii)~attention-based importance scoring may not capture all aspects of visual saliency---learned importance predictors could help; (iii)~scaling to higher resolutions (e.g., $1920{\times}1080$) requires investigating the interaction between spatial compression and quality.

PackForcing: Short Video Training Suffices for Long Video Sampling and Long Context Inference  (2603.25730 - Mao et al., 26 Mar 2026) in Appendix, Extended Discussion on Limitations