Adaptive Compression Ratio According to Scene Complexity
Develop an adaptive spatiotemporal compression mechanism for the mid-partition tokens in PackForcing that adjusts the compression ratio (beyond the current fixed 128× volume/≈32× token reduction) based on scene complexity to better preserve quality under varying content dynamics.
References
Several directions remain open: (i)~the fixed compression ratio ($128\times$ volume / ${\sim}32\times$ token) could be made adaptive to scene complexity; (ii)~attention-based importance scoring may not capture all aspects of visual saliency---learned importance predictors could help; (iii)~scaling to higher resolutions (e.g., $1920{\times}1080$) requires investigating the interaction between spatial compression and quality.
— PackForcing: Short Video Training Suffices for Long Video Sampling and Long Context Inference
(2603.25730 - Mao et al., 26 Mar 2026) in Appendix, Extended Discussion on Limitations