Parrot: Pareto-optimal Multi-Reward Reinforcement Learning Framework for Text-to-Image Generation

Published 11 Jan 2024 in cs.CV | (2401.05675v2)

Abstract: Recent works have demonstrated that using reinforcement learning (RL) with multiple quality rewards can improve the quality of generated images in text-to-image (T2I) generation. However, manually adjusting reward weights poses challenges and may cause over-optimization in certain metrics. To solve this, we propose Parrot, which addresses the issue through multi-objective optimization and introduces an effective multi-reward optimization strategy to approximate Pareto optimal. Utilizing batch-wise Pareto optimal selection, Parrot automatically identifies the optimal trade-off among different rewards. We use the novel multi-reward optimization algorithm to jointly optimize the T2I model and a prompt expansion network, resulting in significant improvement of image quality and also allow to control the trade-off of different rewards using a reward related prompt during inference. Furthermore, we introduce original prompt-centered guidance at inference time, ensuring fidelity to user input after prompt expansion. Extensive experiments and a user study validate the superiority of Parrot over several baselines across various quality criteria, including aesthetics, human preference, text-image alignment, and image sentiment.

Abstract PDF HTML Upgrade to Chat

References (55)

Citations (12)

View on Semantic Scholar

Summary

The paper presents a multi-reward reinforcement learning framework that balances various image quality metrics using Pareto-optimal selection.
It jointly optimizes text prompt expansion with image generation to improve alignment and maintain prompt fidelity.
Experimental evaluations show that Parrot outperforms baseline methods in aesthetics, human preference, and sentiment metrics.

Introduction

The field of text-to-image (T2I) generation has seen a remarkable evolution thanks to diffusion models and pre-trained text encoders, resulting in the ability to generate images from textual descriptions. Despite these advances, the challenge of creating images that align well with multiple quality criteria—such as aesthetic appeal, adherence to human preferences, and emotional resonance—remains. Addressing this issue, researchers have introduced Parrot, a multi-reward reinforcement learning (RL) framework that optimizes the T2I process using Pareto-optimal selection to balance various image quality rewards effectively.

Fine-tuning T2I Models with Multiple Rewards

Past methods have explored the use of RL to refine T2I models, achieving quality improvements by using individual quality metrics as reward functions. However, optimizing for multiple quality metrics often required manual tuning of reward weights, which is impractical. Parrot, on the other hand, autonomously determines the optimal trade-offs among various rewards. By focusing on the Pareto-optimal set—a selection of images within a training batch that embody the optimal balance among different objectives—the model jointly enhances image quality on several fronts.

Joint Optimization and Prompt-Centered Guidance

Parrot's approach goes a step further by simultaneously tuning the prompt expansion network (PEN) with the T2I model. This integrated optimization allows for better synergy between detailed text prompts and image generation, leading to higher quality outcomes. Furthermore, the framework addresses the risk of straying away from the original prompt by employing a prompt-centered guidance strategy during inference, ensuring generated images remain true to the user's original input.

Experimental Evaluation

Extensive testing and user studies illustrate that Parrot sets a new standard against various baselines. Compared to methods that do not involve prompt expansion or fine-tune only part of the generation process, Parrot shows marked improvements in text-image alignment, aesthetics, human preference, and sentiment. The user study corroborates these findings, with Parrot outperforming the competition across all evaluated criteria.

Conclusion

Parrot's introduction is a significant step towards enhancing the quality of T2I generation. With its novel use of multi-reward RL and Pareto optimization, Parrot improves image quality on multiple fronts. Simultaneously, joint optimization and original prompt-centered guidance safeguard the relevance of the generated images to the original text prompts. As T2I technology continues to evolve, frameworks like Parrot pave the way for increasingly sophisticated digital image creation tools that cater to a variety of quality metrics.

Further Considerations

While the framework advances T2I generation, it's important to note that the quality and biases of the reward models it uses will influence its performance. As the field progresses, refinements in these reward metrics are anticipated to continually improve Parrot's output quality. Additionally, given the potential for misuse in generating inappropriate content, ethical considerations around the user's influence on T2I generation remain critical. As such, responsible development and deployment of such technology are paramount.

Markdown Report Issue