Adaptive Compression Ratio According to Scene Complexity

Develop an adaptive spatiotemporal compression mechanism for the mid-partition tokens in PackForcing that adjusts the compression ratio (beyond the current fixed 128× volume/≈32× token reduction) based on scene complexity to better preserve quality under varying content dynamics.

Background

PackForcing compresses mid-range historical tokens using a dual-branch module that achieves a fixed 128× volume (≈32× token) reduction to bound memory while retaining information. The authors note that a fixed ratio may not suit diverse scenes with varying motion and texture complexity.

They explicitly list making the compression ratio adaptive to scene complexity as an open direction, suggesting that dynamic adjustment could improve quality–compression trade-offs.

References

Several directions remain open: (i)~the fixed compression ratio ($128\times$ volume / ${\sim}32\times$ token) could be made adaptive to scene complexity; (ii)~attention-based importance scoring may not capture all aspects of visual saliency---learned importance predictors could help; (iii)~scaling to higher resolutions (e.g., $1920{\times}1080$) requires investigating the interaction between spatial compression and quality.

PackForcing: Short Video Training Suffices for Long Video Sampling and Long Context Inference  (2603.25730 - Mao et al., 26 Mar 2026) in Appendix, Extended Discussion on Limitations