Diff-Plugin: Revitalizing Details for Diffusion-based Low-level Tasks
Abstract: Diffusion models trained on large-scale datasets have achieved remarkable progress in image synthesis. However, due to the randomness in the diffusion process, they often struggle with handling diverse low-level tasks that require details preservation. To overcome this limitation, we present a new Diff-Plugin framework to enable a single pre-trained diffusion model to generate high-fidelity results across a variety of low-level tasks. Specifically, we first propose a lightweight Task-Plugin module with a dual branch design to provide task-specific priors, guiding the diffusion process in preserving image content. We then propose a Plugin-Selector that can automatically select different Task-Plugins based on the text instruction, allowing users to edit images by indicating multiple low-level tasks with natural language. We conduct extensive experiments on 8 low-level vision tasks. The results demonstrate the superiority of Diff-Plugin over existing methods, particularly in real-world scenarios. Our ablations further validate that Diff-Plugin is stable, schedulable, and supports robust training across different dataset sizes.
- Spatext: Spatio-textual representation for controllable image generation. In CVPR, pages 18370–18380, 2023.
- ediffi: Text-to-image diffusion models with an ensemble of expert denoisers. arXiv, 2022.
- Demystifying mmd gans. In ICLR, 2018.
- Instructpix2pix: Learning to follow image editing instructions. In CVPR, pages 18392–18402, 2023.
- A simple framework for contrastive learning of visual representations. In ICML, pages 1597–1607, 2020.
- Multi-label image recognition with graph convolutional networks. In CVPR, pages 5177–5186, 2019.
- Diffusion posterior sampling for general noisy inverse problems. In ICLR, 2023.
- Zero-shot spatial layout conditioning for text-to-image diffusion models. In ICCV, pages 2174–2183, 2023.
- Diffusion models beat gans on image synthesis. In NeurIPS, pages 8780–8794, 2021.
- Prompt tuning inversion for text-driven image editing using diffusion models. In ICCV, pages 7430–7440, 2023.
- Taming transformers for high-resolution image synthesis. In CVPR, pages 12873–12883, 2021.
- Generative diffusion prior for unified image restoration and enhancement. In CVPR, pages 9935–9946, 2023.
- A multi-task network for joint specular highlight detection and removal. In CVPR, pages 7752–7761, 2021.
- An image is worth one word: Personalizing text-to-image generation using textual inversion. In ICLR, 2023.
- Vqfr: Blind face restoration with vector-quantized dictionary and parallel decoder. In ECCV, pages 126–143, 2022.
- Shadowdiffusion: When degradation prior meets diffusion model for shadow removal. In CVPR, pages 14049–14058, 2023.
- Lime: Low-light image enhancement via illumination map estimation. IEEE TIP, 26(2):982–993, 2016.
- Mask r-cnn. In ICCV, pages 2961–2969, 2017.
- Prompt-to-prompt image editing with cross attention control. In ICLR, 2022.
- Gans trained by a two time-scale update rule converge to a local nash equilibrium. In NeurIPS, 2017.
- Classifier-free diffusion guidance. arXiv, 2022.
- Denoising diffusion probabilistic models. In NeurIPS, pages 6840–6851, 2020.
- Labeled faces in the wild: A database forstudying face recognition in unconstrained environments. In Technical report, University of Massachusetts, Amherst, 2007.
- Low-light image enhancement with wavelet-based diffusion models. TOG, 42(6):1–14, 2023.
- Diffusion models for zero-shot open-vocabulary segmentation. arXiv, 2023.
- A style-based generator architecture for generative adversarial networks. In CVPR, pages 4401–4410, 2019.
- Denoising diffusion restoration models. In NeurIPS, 2022.
- Imagic: Text-based real image editing with diffusion models. In CVPR, pages 6007–6017, 2023.
- Multi-concept customization of text-to-image diffusion. In CVPR, pages 1931–1941, 2023.
- Contrast enhancement based on layered difference representation of 2d histograms. IEEE TIP, 22(12):5372–5384, 2013.
- Your diffusion model is secretly a zero-shot classifier. In ICCV, pages 2206–2217, 2023.
- Benchmarking single-image dehazing and beyond. IEEE TIP, 28(1):492–505, 2018.
- All-in-one image restoration for unknown corruption. In CVPR, pages 17452–17462, 2022.
- Diffbir: Towards blind image restoration with generative diffusion prior. arXiv, 2023.
- Visual instruction tuning. In NeurIPS, 2023.
- Desnownet: Context-aware deep network for snow removal. IEEE TIP, 27(6):3064–3073, 2018.
- Decoupled weight decay regularization. arXiv, 2017.
- Perceptual quality assessment for multi-exposure image fusion. IEEE TIP, 24(11):3345–3356, 2015.
- Null-text inversion for editing real images using guided diffusion models. In CVPR, pages 6038–6047, 2023.
- T2i-adapter: Learning adapters to dig out more controllable ability for text-to-image diffusion models. arXiv, 2023.
- Deep multi-scale convolutional neural network for dynamic scene deblurring. In CVPR, pages 3883–3891, 2017.
- Glide: Towards photorealistic image generation and editing with text-guided diffusion models. PMLR, 2021.
- OpenAI. Chatgpt plugins: https://openai.com/blog/chatgpt-plugins. 2023a.
- OpenAI. Gpt-4 technical report. arXiv, 2023b.
- Restoring vision in adverse weather conditions with patch-based denoising diffusion models. IEEE TPAMI, 2023.
- Zero-shot image-to-image translation. In SIGGRAPH, pages 1–11, 2023.
- Promptir: Prompting for all-in-one blind image restoration. In NeurIPS, 2023.
- Unicontrol: A unified diffusion model for controllable visual generation in the wild. In NeurIPS, 2023.
- Learning transferable visual models from natural language supervision. In ICML, pages 8748–8763, 2021.
- Exploring the limits of transfer learning with a unified text-to-text transformer. JMLR, pages 5485–5551, 2020.
- Hierarchical text-conditional image generation with clip latents. arXiv, 2022.
- Multiscale structure guided diffusion for image deblurring. In ICCV, pages 10721–10733, 2023.
- Real-world blur dataset for learning and benchmarking deblurring algorithms. In ECCV, pages 184–201, 2020.
- High-resolution image synthesis with latent diffusion models. In CVPR, pages 10684–10695, 2022.
- Dreambooth: Fine tuning text-to-image diffusion models for subject-driven generation. In CVPR, pages 22500–22510, 2023.
- Palette: Image-to-image diffusion models. In SIGGRAPH, pages 1–10, 2022a.
- Photorealistic text-to-image diffusion models with deep language understanding. In NeurIPS, pages 36479–36494, 2022b.
- Image super-resolution via iterative refinement. IEEE TPAMI, 45(4):4713–4726, 2022c.
- Laion-5b: An open large-scale dataset for training next generation image-text models. In NeurIPS, pages 25278–25294, 2022.
- Deep unsupervised learning using nonequilibrium thermodynamics. In ICML, pages 2256–2265, 2015.
- Denoising diffusion implicit models. In ICLR, 2021.
- Generative modeling by estimating gradients of the data distribution. In NeurIPS, 2019.
- Plug-and-play diffusion features for text-driven image-to-image translation. In CVPR, pages 1921–1930, 2023.
- On the evaluation of illumination compensation algorithms. MTA, 77:9211–9231, 2018.
- Edict: Exact diffusion inversion via coupled transformations. In CVPR, pages 22532–22541, 2023.
- Exploiting diffusion prior for real-world image super-resolution. arXiv, 2023a.
- Naturalness preserved enhancement algorithm for non-uniform illumination images. IEEE TIP, 22(9):3538–3548, 2013.
- Spatial attentive single-image deraining with a high quality real rain dataset. In CVPR, pages 12270–12279, 2019.
- Towards real-world blind face restoration with generative facial prior. In CVPR, pages 9168–9178, 2021.
- Zero-shot image restoration using denoising diffusion null-space model. In ICLR, 2022.
- Dr2: Diffusion-based robust degradation remover for blind face restoration. In CVPR, pages 1704–1713, 2023b.
- Deep retinex decomposition for low-light enhancement. In BMVC, 2018.
- Deblurring via stochastic refinement. In CVPR, pages 16293–16303, 2022.
- Diffir: Efficient diffusion model for image restoration. In ICCV, pages 13095–13105, 2023.
- Plug-and-play document modules for pre-trained models. In ACL, 2023.
- Boxdiff: Text-to-image synthesis with training-free box-constrained diffusion. In ICCV, pages 7452–7461, 2023.
- Small models are valuable plug-ins for large language models. arXiv, 2023a.
- Prompt-free diffusion: Taking” text” out of text-to-image diffusion models. arXiv, 2023b.
- Implicit neural representation for cooperative low-light image enhancement. In ICCV, pages 12918–12927, 2023.
- Deep joint rain detection and removal from a single image. In CVPR, pages 1357–1366, 2017.
- Perceiving and modeling density for image dehazing. In ECCV, pages 130–145, 2022.
- Adverse weather removal with codebook priors. In ICCV, pages 12653–12664, 2023.
- Diff-retinex: Rethinking low-light image enhancement with a generative diffusion model. In CVPR, pages 12302–12311, 2023.
- Towards efficient and scale-robust ultra-high-definition image demoiréing. In ECCV, pages 646–662, 2022.
- Aim 2019 challenge on image demoireing: Methods and results. In ICCVW, pages 3534–3545, 2019.
- Multi-stage progressive image restoration. In CVPR, pages 14821–14831, 2021.
- Restormer: Efficient transformer for high-resolution image restoration. In CVPR, pages 5728–5739, 2022.
- Deep dense multi-scale network for snow removal using semantic and depth priors. IEEE TIP, 30:7419–7431, 2021.
- Magicbrush: A manually annotated dataset for instruction-guided image editing. In NeurIPS, 2023a.
- Adding conditional control to text-to-image diffusion models. In ICCV, pages 3836–3847, 2023b.
- Inversion-based style transfer with diffusion models. In CVPR, pages 10146–10156, 2023c.
- A unified conditional framework for diffusion-based image restoration. In NeurIPS, 2023d.
- Sine: Single image editing with text-to-image diffusion models. In CVPR, pages 6027–6037, 2023e.
- Uni-controlnet: All-in-one control to text-to-image diffusion models. In NeurIPS, 2023a.
- Towards authentic face restoration with iterative diffusion models and beyond. In ICCV, pages 7312–7322, 2023b.
- Generative prompt model for weakly supervised object localization. In ICCV, pages 6351–6361, 2023c.
- Minigpt-4: Enhancing vision-language understanding with advanced large language models. In ICLR, 2024.
- Learning weather-general and weather-specific features for image restoration under multiple adverse weather conditions. In CVPR, pages 21747–21758, 2023.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.