Controlled Training Data Generation with Diffusion Models
Abstract: We present a method to control a text-to-image generative model to produce training data useful for supervised learning. Unlike previous works that employ an open-loop approach and pre-define prompts to generate new data using either a LLM or human expertise, we develop an automated closed-loop system which involves two feedback mechanisms. The first mechanism uses feedback from a given supervised model and finds adversarial prompts that result in image generations that maximize the model loss. While these adversarial prompts result in diverse data informed by the model, they are not informed of the target distribution, which can be inefficient. Therefore, we introduce the second feedback mechanism that guides the generation process towards a certain target distribution. We call the method combining these two mechanisms Guided Adversarial Prompts. We perform our evaluations on different tasks, datasets and architectures, with different types of distribution shifts (spuriously correlated data, unseen domains) and demonstrate the efficiency of the proposed feedback mechanisms compared to open-loop approaches.
- Break-a-scene: Extracting multiple concepts from a single image. arXiv preprint arXiv:2305.16311, 2023.
- Efficient Pipeline for Camera Trap Image Review, 2019. arXiv:1907.06772 [cs].
- The iwildcam 2021 competition dataset. arXiv preprint arXiv:2105.03494, 2021a.
- The iwildcam 2021 competition dataset. arXiv preprint arXiv:2105.03494, 2021b.
- This dataset does not exist: training models from generated images. In ICASSP 2020-2020 IEEE international conference on acoustics, speech and signal processing (ICASSP), pages 1–5. IEEE, 2020.
- Visual control of manual actions: brain mechanisms in typical development and developmental disorders. Developmental Medicine and Child Neurology, 55 Suppl 4:13–18, 2013.
- Instructpix2pix: Learning to follow image editing instructions. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 18392–18402, 2023.
- Ensembling with Deep Generative Views, 2021. arXiv:2104.14551 [cs].
- Diffedit: Diffusion-based semantic image editing with mask guidance. arXiv preprint arXiv:2210.11427, 2022.
- Autoaugment: Learning augmentation policies from data. arXiv preprint arXiv:1805.09501, 2018.
- Randaugment: Practical automated data augmentation with a reduced search space. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition workshops, pages 702–703, 2020a.
- Randaugment: Practical automated data augmentation with a reduced search space. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition workshops, pages 702–703, 2020b.
- Imagenet: A large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition, pages 248–255. Ieee, 2009.
- Adversarial Training Helps Transfer Learning via Better Representations, 2021. arXiv:2106.10189 [cs].
- Viewfool: Evaluating the robustness of visual recognition to adversarial viewpoints. Advances in Neural Information Processing Systems, 35:36789–36803, 2022.
- An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929, 2020a.
- An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929, 2020b.
- Diversify your vision datasets with automatic diffusion-based augmentation. arXiv preprint arXiv:2305.16289, 2023.
- Omnidata: A scalable pipeline for making multi-task mid-level vision datasets from 3D scans. In Proceedings of the IEEE/CVF international conference on computer vision, pages 10786–10796, 2021a.
- Omnidata: A Scalable Pipeline for Making Multi-Task Mid-Level Vision Datasets from 3D Scans. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 10786–10796, 2021b.
- DataComp: In search of the next generation of multimodal datasets, 2023. arXiv:2304.14108 [cs].
- An image is worth one word: Personalizing text-to-image generation using textual inversion. arXiv preprint arXiv:2208.01618, 2022.
- ImageNet-trained CNNs are biased towards texture; increasing shape bias improves accuracy and robustness. arXiv preprint arXiv:1811.12231, 2018.
- Svdiff: Compact parameter space for diffusion fine-tuning. arXiv preprint arXiv:2303.11305, 2023.
- Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770–778, 2016a.
- Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 770–778, 2016b.
- Is synthetic data from generative models ready for image recognition? arXiv preprint arXiv:2210.07574, 2022a.
- Is synthetic data from generative models ready for image recognition? arXiv preprint arXiv:2210.07574, 2022b.
- Benchmarking neural network robustness to common corruptions and perturbations. arXiv preprint arXiv:1903.12261, 2019.
- Augmix: A simple data processing method to improve robustness and uncertainty. arXiv preprint arXiv:1912.02781, 2019.
- The many faces of robustness: A critical analysis of out-of-distribution generalization. In Proceedings of the IEEE/CVF international conference on computer vision, pages 8340–8349, 2021.
- Prompt-to-prompt image editing with cross attention control. arXiv preprint arXiv:2208.01626, 2022.
- Denoising diffusion probabilistic models. Advances in neural information processing systems, 33:6840–6851, 2020.
- An edit friendly DDPM noise space: Inversion and manipulations. arXiv preprint arXiv:2304.06140, 2023.
- Generative models as a data source for multiview representation learning. arXiv preprint arXiv:2106.05258, 2021.
- Distilling model failures as directions in latent space. arXiv preprint arXiv:2206.14754, 2022.
- 3D Common Corruptions for Object Recognition. In ICML 2022 Shift Happens Workshop, 2022.
- Adam: A Method for Stochastic Optimization, 2017. arXiv:1412.6980 [cs].
- Auto-Encoding Variational Bayes, 2022. arXiv:1312.6114 [cs, stat].
- Segment Anything, 2023. arXiv:2304.02643 [cs].
- Wilds: A benchmark of in-the-wild distribution shifts. In International conference on machine learning, pages 5637–5664. PMLR, 2021a.
- Wilds: A benchmark of in-the-wild distribution shifts. In International Conference on Machine Learning, pages 5637–5664. PMLR, 2021b.
- Semantic palette: Guiding scene generation with class proportions. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 9342–9350, 2021.
- Multiple sensitive periods in human visual development: evidence from visually deprived children. Developmental Psychobiology, 46(3):163–183, 2005.
- BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation, 2022. arXiv:2201.12086 [cs].
- Decoupled Weight Decay Regularization, 2019. arXiv:1711.05101 [cs, math].
- RePaint: Inpainting using Denoising Diffusion Probabilistic Models, 2022. arXiv:2201.09865 [cs].
- Latent Consistency Models: Synthesizing High-Resolution Images with Few-Step Inference, 2023. arXiv:2310.04378 [cs].
- Towards deep learning models resistant to adversarial attacks. arXiv preprint arXiv:1706.06083, 2017.
- Sdedit: Guided image synthesis and editing with stochastic differential equations. arXiv preprint arXiv:2108.01073, 2021a.
- Sdedit: Guided image synthesis and editing with stochastic differential equations. arXiv preprint arXiv:2108.01073, 2021b.
- Accuracy on the line: On the strong correlation between out-of-distribution and in-distribution generalization. In International conference on machine learning, pages 7721–7735. PMLR, 2021.
- Null-text inversion for editing real images using guided diffusion models. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 6038–6047, 2023.
- T2i-adapter: Learning adapters to dig out more controllable ability for text-to-image diffusion models. arXiv preprint arXiv:2302.08453, 2023.
- GLIDE: Towards Photorealistic Image Generation and Editing with Text-Guided Diffusion Models, 2022. arXiv:2112.10741 [cs].
- OpenAI. GPT-4 Technical Report, 2023. arXiv:2303.08774 [cs].
- Learning transferable visual models from natural language supervision. In International conference on machine learning, pages 8748–8763. PMLR, 2021.
- Vision transformers for dense prediction. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 12179–12188, 2021.
- Classification Accuracy Score for Conditional Generative Models, 2019. arXiv:1905.10887 [cs, stat].
- On the convergence of adam and beyond. arXiv preprint arXiv:1904.09237, 2019.
- High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 10684–10695, 2022a.
- High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 10684–10695, 2022b.
- U-net: Convolutional networks for biomedical image segmentation. In International conference on medical image computing and computer-assisted intervention, pages 234–241. Springer, 2015.
- Dreambooth: Fine tuning text-to-image diffusion models for subject-driven generation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 22500–22510, 2023.
- Distributionally robust neural networks for group shifts: On the importance of regularization for worst-case generalization. arXiv preprint arXiv:1911.08731, 2019a.
- Distributionally robust neural networks for group shifts: On the importance of regularization for worst-case generalization. arXiv preprint arXiv:1911.08731, 2019b.
- Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding, 2022. arXiv:2205.11487 [cs].
- Fake it till you make it: Learning transferable representations from synthetic ImageNet clones. In CVPR 2023–IEEE/CVF conference on computer vision and pattern recognition, 2023.
- LAION-5B: An open large-scale dataset for training next generation image-text models, 2022. arXiv:2210.08402 [cs].
- Deep unsupervised learning using nonequilibrium thermodynamics. In International conference on machine learning, pages 2256–2265. PMLR, 2015.
- Denoising diffusion implicit models. arXiv preprint arXiv:2010.02502, 2020a.
- Denoising diffusion implicit models. arXiv preprint arXiv:2010.02502, 2020b.
- The Replica Dataset: A Digital Replica of Indoor Spaces. arXiv preprint arXiv:1906.05797, 2019.
- Measuring robustness to natural distribution shifts in image classification. arXiv preprint arXiv:2007.00644, 2020.
- The Caltech-UCSD Birds-200-2011 Dataset. California Institute of Technology, 2011.
- Learning perturbation sets for robust machine learning. arXiv preprint arXiv:2007.08450, 2020.
- Not just pretty pictures: Text-to-image generators enable interpretable interventions for robust representations. arXiv preprint arXiv:2212.11237, 2022.
- Cutmix: Regularization strategy to train strong classifiers with localizable features. In Proceedings of the IEEE/CVF international conference on computer vision, pages 6023–6032, 2019.
- Robust Learning Through Cross-Task Consistency. arXiv preprint arXiv:2006.04096, 2020.
- Taskonomy: Disentangling task transfer learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 3712–3722, 2018.
- mixup: Beyond empirical risk minimization. arXiv preprint arXiv:1710.09412, 2017.
- Adding conditional control to text-to-image diffusion models. arXiv preprint arXiv:2302.05543, 2023a.
- Adding conditional control to text-to-image diffusion models. arXiv preprint arXiv:2302.05543, 2023b.
- Learning Deep Features for Scene Recognition using Places Database. In Advances in Neural Information Processing Systems. Curran Associates, Inc., 2014.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.