Papers
Topics
Authors
Recent
Search
2000 character limit reached

L-C4: Language-Based Video Colorization for Creative and Consistent Color

Published 7 Oct 2024 in cs.CV | (2410.04972v2)

Abstract: Automatic video colorization is inherently an ill-posed problem because each monochrome frame has multiple optional color candidates. Previous exemplar-based video colorization methods restrict the user's imagination due to the elaborate retrieval process. Alternatively, conditional image colorization methods combined with post-processing algorithms still struggle to maintain temporal consistency. To address these issues, we present Language-based video Colorization for Creative and Consistent Colors (L-C4) to guide the colorization process using user-provided language descriptions. Our model is built upon a pre-trained cross-modality generative model, leveraging its comprehensive language understanding and robust color representation abilities. We introduce the cross-modality pre-fusion module to generate instance-aware text embeddings, enabling the application of creative colors. Additionally, we propose temporally deformable attention to prevent flickering or color shifts, and cross-clip fusion to maintain long-term color consistency. Extensive experimental results demonstrate that L-C4 outperforms relevant methods, achieving semantically accurate colors, unrestricted creative correspondence, and temporally robust consistency.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (46)
  1. GPT-4 technical report. arXiv preprint arXiv:2303.08774, 2023.
  2. Is space-time attention all you need for video understanding? In ICML, 2021.
  3. Align your latents: High-resolution video synthesis with latent diffusion models. In CVPR, 2023.
  4. Quo vadis, action recognition? a new model and the Kinetics dataset. In CVPR, 2017.
  5. L-CoDer: Language-based colorization with color-object decoupling transformer. In ECCV, 2022.
  6. L-CAD: Language-based colorization with any-level descriptions using diffusion priors. In NeurIPS, 2023a.
  7. L-CoIns: Language-based colorization with instance awareness. In CVPR, 2023b.
  8. Language-based image editing with recurrent attentive models. In CVPR, 2018.
  9. InstructBLIP: Towards general-purpose vision-language models with instruction tuning. In NIPS, 2023.
  10. Structure and content-guided video synthesis with diffusion models. In ICCV, 2023.
  11. Measuring colorfulness in natural images. In Human vision and electronic imaging VIII, 2003.
  12. Latent video diffusion models for high-fidelity long video generation. arXiv preprint arXiv:2211.13221, 2022.
  13. Imagen video: High definition video generation with diffusion models, 2022a.
  14. Video diffusion models. In NeurIPS, 2022b.
  15. UniColor: A unified framework for multi-modal colorization with transformer. In SIGGRAPH Asia, 2022.
  16. Scope of validity of PSNR in image/video quality assessment. Electronics letters, 2008.
  17. Deepremaster: temporal source-reference attention networks for comprehensive video enhancement. ACM TOG, 2019.
  18. Learning blind video temporal consistency. In ECCV, 2018.
  19. Fully automatic video colorization with self-regularization and diversity. In CVPR, 2019.
  20. Blind video temporal consistency via deep video prior. In NeurIPS, 2020.
  21. Blind video deflickering by neural filtering with a flawed atlas. In CVPR, 2023.
  22. Control color: Multimodal diffusion-based interactive image colorization. arXiv preprint arXiv:2402.10855, 2024.
  23. Video colorization with pre-trained text-to-image diffusion models, 2023.
  24. Temporally consistent video colorization with deep feature propagation and self-regularization learning. CVM, 2024.
  25. Learning to color from language. In NAACL, 2018.
  26. A benchmark dataset and evaluation methodology for video object segmentation. In CVPR, 2016.
  27. FiLM: Visual reasoning with a general conditioning layer. In AAAI, 2018.
  28. FreeNoise: Tuning-free longer video diffusion via noise rescheduling. In ICLR, 2024.
  29. Learning transferable visual models from natural language supervision. In ICML, 2021.
  30. High-resolution image synthesis with latent diffusion models. In CVPR, 2022.
  31. Photorealistic text-to-image diffusion models with deep language understanding. In NeurIPS, 2022.
  32. Automatic temporally coherent video colorization. In CRV, 2019.
  33. FVD: A new metric for video generation. In ICLR, 2019.
  34. Bringing old films back to life. In CVPR, 2022.
  35. Gen-L-Video: Multi-text to long video generation via temporal co-denoising. In NeurIPS, 2023a.
  36. InternVid: A large-scale video-text dataset for multimodal understanding and generation. In ICLR, 2023b.
  37. Image quality assessment: From error visibility to structural similarity. TIP, 2004.
  38. L-CoDe: Language-based colorization using color-object decoupled conditions. In AAAI, 2022.
  39. Make-Your-Video: Customized video generation using textual and structural guidance. IEEE TVCG, 2024.
  40. VRIPT: A video is worth thousands of words. arXiv preprint arXiv:2406.06040, 2024a.
  41. BiSTNet: Semantic image prior guided bidirectional temporal feature fusion for deep exemplar-based video colorization. IEEE TPAMI, 2024b.
  42. Deep exemplar-based video colorization. In CVPR, 2019.
  43. Adding conditional control to text-to-image diffusion models. In ICCV, 2023.
  44. The unreasonable effectiveness of deep features as a perceptual metric. In CVPR, 2018.
  45. ControlVideo: Training-free controllable text-to-video generation. In ICLR, 2024.
  46. VCGAN: Video colorization with hybrid generative adversarial network. IEEE TMM, 2023.

Summary

No one has generated a summary of this paper yet.

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.