Extreme Video Compression with Pre-trained Diffusion Models
Abstract: Diffusion models have achieved remarkable success in generating high quality image and video data. More recently, they have also been used for image compression with high perceptual quality. In this paper, we present a novel approach to extreme video compression leveraging the predictive power of diffusion-based generative models at the decoder. The conditional diffusion model takes several neural compressed frames and generates subsequent frames. When the reconstruction quality drops below the desired level, new frames are encoded to restart prediction. The entire video is sequentially encoded to achieve a visually pleasing reconstruction, considering perceptual quality metrics such as the learned perceptual image patch similarity (LPIPS) and the Frechet video distance (FVD), at bit rates as low as 0.02 bits per pixel (bpp). Experimental results demonstrate the effectiveness of the proposed scheme compared to standard codecs such as H.264 and H.265 in the low bpp regime. The results showcase the potential of exploiting the temporal relations in video data using generative models. Code is available at: https://github.com/ElesionKyrie/Extreme-Video-Compression-With-Prediction-Using-Pre-trainded-Diffusion-Models-
- “Learned video compression,” in 2019 IEEE/CVF International Conference on Computer Vision (ICCV), 2019, pp. 3453–3462.
- “Adversarial distortion for learned video compression,” in 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), 2020, pp. 640–644.
- “Learned video compression with efficient temporal context learning,” IEEE Transactions on Image Processing, vol. 32, pp. 3188–3198, 2023.
- “Align your latents: High-resolution video synthesis with latent diffusion models,” in 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2023, pp. 22563–22575.
- “Video diffusion models,” arXiv:2204.03458, 2022.
- “Transformation-based adversarial video prediction on large-scale data,” CoRR, vol. abs/2003.04035, 2020.
- “Stochastic dynamics for video infilling,” in 2020 IEEE Winter Conference on Applications of Computer Vision (WACV), 2020, pp. 2703–2712.
- “Denoising diffusion probabilistic models,” in Advances in Neural Information Processing Systems, H. Larochelle, M. Ranzato, R. Hadsell, M.F. Balcan, and H. Lin, Eds. 2020, vol. 33, pp. 6840–6851, Curran Associates, Inc.
- “Score-based generative modeling through stochastic differential equations,” in International Conference on Learning Representations, 2021.
- “Deep contextual video compression,” Advances in Neural Information Processing Systems, vol. 34, pp. 18114–18125, 2021.
- “Dvc: An end-to-end deep video compression framework,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019, pp. 11006–11015.
- “Neural video compression with diverse contexts,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 22616–22626.
- “Mocogan: Decomposing motion and content for video generation,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 1526–1535.
- “Denoising diffusion restoration models,” Advances in Neural Information Processing Systems, vol. 35, pp. 23593–23606, 2022.
- “Diffusion probabilistic modeling for video generation,” arXiv preprint arXiv:2203.09481, 2022.
- “Diffusion models for video prediction and infilling,” Transactions on Machine Learning Research, 2022.
- “Mcvd-masked conditional video diffusion for prediction, generation, and interpolation,” Advances in Neural Information Processing Systems, vol. 35, pp. 23371–23385, 2022.
- “Neural video compression using gans for detail synthesis and propagation,” in Computer Vision – ECCV 2022, Shai Avidan, Gabriel Brostow, Moustapha Cissé, Giovanni Maria Farinella, and Tal Hassner, Eds., Cham, 2022, pp. 562–578, Springer Nature Switzerland.
- “Lossy image compression with conditional diffusion models,” arXiv:2209.06950, 2023.
- “A residual diffusion model for high perceptual quality codec augmentation,” arXiv:2301.05489, 2023.
- “High-fidelity image compression with score-based generative models,” arXiv:2305.18231, 2023.
- “Elic: Efficient learned image compression with unevenly grouped space-channel contextual adaptive coding,” in IEEE/CVF Conf. on Computer Vision and Pattern Recog., 2022, pp. 5718–5727.
- “Unsupervised learning of video representations using lstms,” in International conference on machine learning. PMLR, 2015, pp. 843–852.
- “The cityscapes dataset for semantic urban scene understanding,” in Proc. of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016.
- “Uvg dataset: 50/120fps 4k sequences for video codec analysis and development,” in 11th ACM Multimedia Systems Conf., 2020, pp. 297–302.
- “Rethinking lossy compression: The rate-distortion-perception tradeoff,” in International Conference on Machine Learning. PMLR, 2019, pp. 675–685.
- “The unreasonable effectiveness of deep features as a perceptual metric,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2018.
- “Fvd: A new metric for video generation,” 2019.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.