DreamClear: High-Capacity Real-World Image Restoration with Privacy-Safe Dataset Curation
Abstract: Image restoration (IR) in real-world scenarios presents significant challenges due to the lack of high-capacity models and comprehensive datasets. To tackle these issues, we present a dual strategy: GenIR, an innovative data curation pipeline, and DreamClear, a cutting-edge Diffusion Transformer (DiT)-based image restoration model. GenIR, our pioneering contribution, is a dual-prompt learning pipeline that overcomes the limitations of existing datasets, which typically comprise only a few thousand images and thus offer limited generalizability for larger models. GenIR streamlines the process into three stages: image-text pair construction, dual-prompt based fine-tuning, and data generation & filtering. This approach circumvents the laborious data crawling process, ensuring copyright compliance and providing a cost-effective, privacy-safe solution for IR dataset construction. The result is a large-scale dataset of one million high-quality images. Our second contribution, DreamClear, is a DiT-based image restoration model. It utilizes the generative priors of text-to-image (T2I) diffusion models and the robust perceptual capabilities of multi-modal LLMs (MLLMs) to achieve photorealistic restoration. To boost the model's adaptability to diverse real-world degradations, we introduce the Mixture of Adaptive Modulator (MoAM). It employs token-wise degradation priors to dynamically integrate various restoration experts, thereby expanding the range of degradations the model can address. Our exhaustive experiments confirm DreamClear's superior performance, underlining the efficacy of our dual strategy for real-world image restoration. Code and pre-trained models are available at: https://github.com/shallowdream204/DreamClear.
- Gpt-4 technical report. arXiv preprint arXiv:2303.08774, 2023.
- E. Agustsson and R. Timofte. Ntire 2017 challenge on single image super-resolution: Dataset and study. In CVPRW, pages 126–135, 2017.
- Lora-ir: Taming low-rank experts for efficient all-in-one image restoration. arXiv preprint arXiv:2410.15385, 2024.
- Multimodal prompt perceiver: Empower adaptiveness generalizability and fidelity for all-in-one image restoration. In CVPR, pages 25432–25444, 2024.
- Uncertainty-aware source-free adaptive image super-resolution with wavelet augmentation transformer. In CVPR, pages 8142–8152, 2024.
- Synthetic data from diffusion models improves imagenet classification. arXiv preprint arXiv:2304.08466, 2023.
- Video generation models as world simulators, 2024.
- Toward real-world single image super-resolution: A new benchmark and a new model. In ICCV, pages 3086–3095, 2019.
- Deep equilibrium diffusion restoration with parallel sampling. In CVPR, pages 2824–2834, 2024.
- Ntire 2023 challenge on 360deg omnidirectional image and video super-resolution: Datasets, methods and results. In CVPRW, pages 1731–1745, 2023.
- Camera lens super-resolution. In CVPR, pages 1652–1660, 2019.
- Masked image training for generalizable deep image denoising. In CVPR, pages 1692–1703, 2023.
- Pixart-α𝛼\alphaitalic_α: Fast training of diffusion transformer for photorealistic text-to-image synthesis. In ICLR, 2024.
- Learning a sparse transformer network for effective image deraining. In CVPR, pages 5896–5905, 2023.
- Ilvr: Conditioning method for denoising diffusion probabilistic models. In ICCV, pages 14367–14376, 2021.
- Imagenet: A large-scale hierarchical image database. In CVPR, pages 248–255, 2009.
- Image quality assessment: Unifying structure and texture similarity. TPAMI, 44(5):2567–2581, 2020.
- Scaling rectified flow transformers for high-resolution image synthesis. arXiv preprint arXiv:2403.03206, 2024.
- Rmt: Retentive networks meet vision transformers. In CVPR, 2024.
- Generative diffusion prior for unified image restoration and enhancement. In CVPR, pages 9935–9946, 2023.
- Dvg-face: Dual variational generation for heterogeneous face recognition. TPAMI, 44(6):2938–2952, 2021.
- Div8k: Diverse 8k resolution image dataset. In ICCVW, pages 3512–3516, 2019.
- Closed-loop matters: Dual regression networks for single image super-resolution. In CVPR, pages 5407–5416, 2020.
- Ms-celeb-1m: A dataset and benchmark for large-scale face recognition. In ECCV, pages 87–102, 2016.
- Synthclip: Are we ready for a fully synthetic clip training? arXiv preprint arXiv:2402.01832, 2024.
- Infimm-webmath-40b: Advancing multimodal pre-training for enhanced mathematical reasoning. arXiv preprint arXiv:2409.12568, 2024.
- Mask r-cnn. In ICCV, pages 2961–2969, 2017.
- Is synthetic data from generative models ready for image recognition? arXiv preprint arXiv:2210.07574, 2022.
- Gans trained by a two time-scale update rule converge to a local nash equilibrium. NeurIPS, pages 6626–6637, 2017.
- J. Ho and T. Salimans. Classifier-free diffusion guidance. arXiv preprint arXiv:2207.12598, 2022.
- Wavelet-srnet: A wavelet-based cnn for multi-scale face super resolution. In ICCV, pages 1689–1697, 2017.
- Memory uncertainty learning for real-world single image deraining. TPAMI, 45(3):3446–3460, 2022.
- Selective wavelet attention learning for single image deraining. IJCV, 129(4):1282–1300, 2021.
- Memory oriented transfer learning for semi-supervised image deraining. In CVPR, pages 7732–7741, 2021.
- Dreampose: Fashion image-to-video synthesis via stable diffusion. In ICCV, pages 22623–22633, 2023.
- A style-based generator architecture for generative adversarial networks. In CVPR, pages 4401–4410, 2019.
- Denoising diffusion restoration models. In NeurIPS, pages 23593–23606, 2022.
- Ugpnet: Universal generative prior for image restoration. In WACV, pages 1598–1608, 2024.
- Lsdir: A large scale dataset for image restoration. In CVPRW, pages 1775–1787, 2023.
- Ntire 2023 challenge on image denoising: Methods and results. In CVPR, pages 1904–1920, 2023.
- Swinir: Image restoration using swin transformer. In ICCVW, pages 1833–1844, 2021.
- Details or artifacts: A locally discriminative learning approach to realistic image super-resolution. In CVPR, pages 5657–5666, 2022.
- Efficient and degradation-adaptive network for real-world image super-resolution. In ECCV, pages 574–591, 2022.
- Enhanced deep residual networks for single image super-resolution. In CVPRW, pages 136–144, 2017.
- Microsoft coco: Common objects in context. In ECCV, pages 740–755, 2014.
- Diffbir: Towards blind image restoration with generative diffusion prior. arXiv preprint arXiv:2308.15070, 2023.
- Visual instruction tuning. In NeurIPS, 2023.
- Survey on leveraging pre-trained generative adversarial networks for image editing and restoration. Science China Information Sciences, 66(5):151101, 2023.
- Learning the degradation distribution for blind image super-resolution. In CVPR, 2022.
- Sdedit: Guided image synthesis and editing with stochastic differential equations. In ICLR, 2022.
- A. Q. Nichol and P. Dhariwal. Improved denoising diffusion probabilistic models. In ICML, pages 8162–8171, 2021.
- Exploiting deep generative prior for versatile image restoration and manipulation. TPAMI, 44(11):7474–7489, 2021.
- W. Peebles and S. Xie. Scalable diffusion models with transformers. In ICCV, pages 4195–4205, 2023.
- Film: Visual reasoning with a general conditioning layer. In AAAI, 2018.
- Sdxl: Improving latent diffusion models for high-resolution image synthesis. In ICLR, 2023.
- Neumann network with recursive kernels for single image defocus deblurring. In CVPR, pages 5754–5763, 2023.
- Real-world blur dataset for learning and benchmarking deblurring algorithms. In ECCV, pages 184–201, 2020.
- Performance measures and a data set for multi-target, multi-camera tracking. In ECCV, pages 17–35, 2016.
- High-resolution image synthesis with latent diffusion models. In CVPR, pages 10684–10695, 2022.
- U-net: Convolutional networks for biomedical image segmentation. In MICCAI, pages 234–241, 2015.
- Outrageously large neural networks: The sparsely-gated mixture-of-experts layer. In ICLR, 2017.
- Gemini: a family of highly capable multimodal models. arXiv preprint arXiv:2312.11805, 2023.
- Exploiting diffusion prior for real-world image super-resolution. arXiv preprint arXiv:2305.07015, 2023.
- Real-esrgan: Training real-world blind super-resolution with pure synthetic data. In ICCVW, pages 1905–1914, 2021.
- Sinsr: Diffusion-based image super-resolution in a single step. In CVPR, 2024.
- Zero-shot image restoration using denoising diffusion null-space model. arXiv preprint arXiv:2212.00490, 2022.
- Lg-bpn: Local and global blind-patch network for self-supervised real-world denoising. In CVPR, pages 18156–18165, 2023.
- Component divide-and-conquer for real-world image super-resolution. In ECCV, pages 101–117, 2020.
- Deblurring via stochastic refinement. In CVPR, pages 16293–16303, 2022.
- Seesr: Towards semantics-aware real-world image super-resolution. In CVPR, 2024.
- Animesr: Learning real-world super-resolution models for animation videos. In NeurIPS, pages 11241–11252, 2022.
- Knowledge distillation based degradation estimation for blind super-resolution. In ICLR, 2023.
- Unified perceptual parsing for scene understanding. In ECCV, pages 418–434, 2018.
- Dual adversarial adaptation for cross-device real-world image super-resolution. In CVPR, pages 5667–5676, 2022.
- Maniqa: Multi-dimension attention network for no-reference image quality assessment. In CVPR, pages 1191–1200, 2022.
- Gan prior embedded network for blind face restoration in the wild. In CVPR, pages 672–681, 2021.
- Pixel-aware stable diffusion for realistic image super-resolution and personalized stylization. arXiv preprint arXiv:2308.14469, 2023.
- Ip-adapter: Text compatible image prompt adapter for text-to-image diffusion models. arXiv preprint arXiv:2308.06721, 2023.
- Depicting beyond scores: Advancing image quality assessment through multi-modal language models. arXiv preprint arXiv:2312.08962, 2023.
- Scaling up to excellence: Practicing model scaling for photo-realistic image restoration in the wild. In CVPR, 2024.
- Unsupervised image super-resolution using cycle-in-cycle generative adversarial networks. In CVPRW, pages 701–710, 2018.
- Resshift: Efficient diffusion model for image super-resolution by residual shifting. In NeurIPS, 2023.
- Scaling vision transformers. In CVPR, pages 12104–12113, 2022.
- Xformer: Hybrid x-shaped transformer for image denoising. In ICLR, 2024.
- Designing a practical degradation model for deep blind image super-resolution. In ICCV, pages 4791–4800, 2021.
- Beyond a gaussian denoiser: Residual learning of deep cnn for image denoising. TIP, 26(7):3142–3155, 2017.
- Ffdnet: Toward a fast and flexible solution for cnn-based image denoising. TIP, 27(9):4608–4622, 2018.
- Adding conditional control to text-to-image diffusion models. In ICCV, pages 3836–3847, 2023.
- A feature-enriched completely blind image quality evaluator. TIP, 24(8):2579–2591, 2015.
- The unreasonable effectiveness of deep features as a perceptual metric. In CVPR, pages 586–595, 2018.
- Zoom to learn, learn to zoom. In CVPR, pages 3762–3770, 2019.
- Ntire 2023 challenge on image super-resolution (x4): Methods and results. In CVPRW, pages 1864–1883, 2023.
- Open-sora: Democratizing efficient video production for all, 2024.
- Scene parsing through ade20k dataset. In CVPR, pages 633–641, 2017.
- Msra-sr: Image super-resolution transformer with multi-scale shared representation acquisition. In ICCV, pages 12665–12676, 2023.
- Ristra: Recursive image super-resolution transformer with relativistic assessment. TMM, 26(8):6475–6487, 2024.
- Image inpainting with contrastive relation network. In ICPR, pages 4420–4427, 2021.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.