Parrot: Pareto-optimal Multi-Reward Reinforcement Learning Framework for Text-to-Image Generation
Abstract: Recent works have demonstrated that using reinforcement learning (RL) with multiple quality rewards can improve the quality of generated images in text-to-image (T2I) generation. However, manually adjusting reward weights poses challenges and may cause over-optimization in certain metrics. To solve this, we propose Parrot, which addresses the issue through multi-objective optimization and introduces an effective multi-reward optimization strategy to approximate Pareto optimal. Utilizing batch-wise Pareto optimal selection, Parrot automatically identifies the optimal trade-off among different rewards. We use the novel multi-reward optimization algorithm to jointly optimize the T2I model and a prompt expansion network, resulting in significant improvement of image quality and also allow to control the trade-off of different rewards using a reward related prompt during inference. Furthermore, we introduce original prompt-centered guidance at inference time, ensuring fidelity to user input after prompt expansion. Extensive experiments and a user study validate the superiority of Parrot over several baselines across various quality criteria, including aesthetics, human preference, text-image alignment, and image sentiment.
- Amazon mechanical turk. https://www.mturk.com/, 2005.
- Palm 2 technical report. arXiv preprint arXiv:2305.10403, 2023.
- Training a helpful and harmless assistant with reinforcement learning from human feedback. arXiv preprint arXiv:2204.05862, 2022.
- Training diffusion models with reinforcement learning. In ICML Workshop, 2023.
- Muse: Text-to-image generation via masked generative transformers. In ICML, 2023.
- Directly fine-tuning diffusion models on differentiable rewards. arXiv preprint arXiv:2309.17400, 2023.
- Emu: Enhancing image generation models using photogenic needles in a haystack. arXiv preprint arXiv:2309.15807, 2023.
- An image is worth 16x16 words: Transformers for image recognition at scale. In ICLR, 2021.
- Optimizing ddpm sampling with shortcut fine-tuning. In ICML, 2023.
- Dpok: Reinforcement learning for fine-tuning text-to-image diffusion models. In NeurIPS, 2023.
- Perceptual quality assessment of smartphone photography. In CVPR, 2020.
- Svdiff: Compact parameter space for diffusion fine-tuning. In ICCV, 2023.
- Optimizing prompts for text-to-image generation. In CoRR, 2022.
- Classifier-free diffusion guidance. In CoRR, 2022.
- Denoising diffusion probabilistic models. In NeurIPS, 2020.
- Koniq-10k: An ecologically valid database for deep learning of blind image quality assessment. TIP, 29:4041–4056, 2020.
- Lora: Low-rank adaptation of large language models. 2022.
- The power of sound (tpos): Audio reactive video generation with stable diffusion. In ICCV, 2023.
- Imagic: Text-based real image editing with diffusion models. In CVPR, 2023.
- Musiq: Multi-scale image quality transformer. In ICCV, 2021.
- Vila: Learning image aesthetics from user comments with vision-language pretraining. In CVPR, 2023.
- Adam: A method for stochastic optimization. In ICLR, 2015.
- Pick-a-pic: An open dataset of user preferences for text-to-image generation. 2023.
- Aligning text-to-image models using human feedback. In CoRR, 2023a.
- Soundini: Sound-guided diffusion for natural video editing. arXiv preprint arXiv:2304.06818, 2023b.
- Gligen: Open-set grounded text-to-image generation. In CVPR, 2023.
- Pareto set learning for expensive multi-objective optimization. In NeurIPS, 2022.
- The steering approach for multi-criteria reinforcement learning. In NeurIPS, 2001.
- Kaisa Miettinen. Nonlinear multiobjective optimization. Springer Science & Business Media, 1999.
- Multi-objective deep reinforcement learning. In CoRR, 2016.
- Ava: A large-scale database for aesthetic visual analysis. In CVPR, 2012.
- Training language models to follow instructions with human feedback. In NeurIPS, 2022.
- Learning transferable visual models from natural language supervision. In ICML, 2021.
- Rewarded soups: towards pareto-optimal alignment by interpolating weights fine-tuned on diverse rewards. In NeurIPS, 2023.
- Hierarchical text-conditional image generation with clip latents. arXiv preprint arXiv:2204.06125, 2022.
- Conceptlab: Creative generation using diffusion prior constraints. arXiv preprint arXiv:2308.02669, 2023.
- A survey of multi-objective sequential decision-making. JAIR, 48:67–113, 2013.
- High-resolution image synthesis with latent diffusion models. In CVPR, 2022.
- U-net: Convolutional networks for biomedical image segmentation. In MICCAI, 2015.
- Palette: Image-to-image diffusion models. In SIGGRAPH, 2022a.
- Photorealistic text-to-image diffusion models with deep language understanding. In NeurIPS, 2022b.
- Laion-5b: An open large-scale dataset for training next generation image-text models. NeurIPS, 2022.
- The emotions of the crowd: Learning image sentiment from tweets via cross-modal distillation. ECAI, 2023.
- Christian Shelton. Balancing multiple sources of reward in reinforcement learning. In NeurIPS, 2000.
- Deep unsupervised learning using nonequilibrium thermodynamics. In ICML, 2015.
- Denoising diffusion implicit models. In ICLR, 2021.
- Managing power consumption and performance of computing systems using reinforcement learning. In NeurIPS, 2007.
- Maxvit: Multi-axis vision transformer. In ECCV, 2022.
- Multi-objective reinforcement learning using sets of pareto dominating policies. JMLR, 15(1):3483–3512, 2014.
- Attention is all you need. In NeurIPS, 2017.
- Imagereward: Learning and evaluating human preferences for text-to-image generation. In NeurIPS, 2023.
- From patches to pictures (paq-2-piq): Mapping the perceptual space of picture quality. In CVPR, 2020.
- Scaling autoregressive models for content-rich text-to-image generation. Transactions on Machine Learning Research, 2022.
- Magvit: Masked generative video transformer. In CVPR, 2023.
- Shifted diffusion for text-to-image generation. In CVPR, 2023.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.