Papers
Topics
Authors
Recent
Search
2000 character limit reached

DiffAgent: Fast and Accurate Text-to-Image API Selection with Large Language Model

Published 31 Mar 2024 in cs.CL and cs.AI | (2404.01342v1)

Abstract: Text-to-image (T2I) generative models have attracted significant attention and found extensive applications within and beyond academic research. For example, the Civitai community, a platform for T2I innovation, currently hosts an impressive array of 74,492 distinct models. However, this diversity presents a formidable challenge in selecting the most appropriate model and parameters, a process that typically requires numerous trials. Drawing inspiration from the tool usage research of LLMs, we introduce DiffAgent, an LLM agent designed to screen the accurate selection in seconds via API calls. DiffAgent leverages a novel two-stage training framework, SFTA, enabling it to accurately align T2I API responses with user input in accordance with human preferences. To train and evaluate DiffAgent's capabilities, we present DABench, a comprehensive dataset encompassing an extensive range of T2I APIs from the community. Our evaluations reveal that DiffAgent not only excels in identifying the appropriate T2I API but also underscores the effectiveness of the SFTA training framework. Codes are available at https://github.com/OpenGVLab/DiffAgent.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (37)
  1. Improving image generation with better captions. 2023.
  2. Microsoft coco captions: Data collection and evaluation server. arXiv preprint arXiv:1504.00325, 2015.
  3. Deep reinforcement learning from human preferences. In Neural Information Processing Systems (NeurIPS), 2017.
  4. Civitai. Civitai. https://civitai.com/, 2022.
  5. Diffusion models beat gans on image synthesis. In Neural Information Processing Systems (NeurIPS), 2021.
  6. Hugging Face. Hugging face. https://huggingface.co/, 2016.
  7. An image is worth one word: Personalizing text-to-image generation using textual inversion. In International Conference on Learning Representations (ICLR), 2022.
  8. Toolkengpt: Augmenting frozen language models with massive tools via tool embeddings. arXiv preprint arXiv:2305.11554, 2023.
  9. Clipscore: A reference-free evaluation metric for image captioning. In Empirical Methods in Natural Language Processing (EMNLP), 2021.
  10. Denoising diffusion probabilistic models. In Neural Information Processing Systems (NeurIPS), 2020.
  11. Lora: Low-rank adaptation of large language models. In International Conference on Learning Representations (ICLR), 2021.
  12. Blip: Bootstrapping language-image pre-training for unified vision-language understanding and generation. In International Conference on Machine Learning (ICML), 2022.
  13. Glide: Towards photorealistic image generation and editing with text-guided diffusion models. In International Conference on Machine Learning (ICML), 2022.
  14. OpenAI. Openai: Introducing chatgpt. https://openai.com/blog/chatgpt, 2022.
  15. Training language models to follow instructions with human feedback. In Neural Information Processing Systems (NeurIPS), 2022.
  16. Gorilla: Large language model connected with massive apis. arXiv preprint arXiv:2305.15334, 2023.
  17. Sdxl: Improving latent diffusion models for high-resolution image synthesis. arXiv preprint arXiv:2307.01952, 2023.
  18. Toolllm: Facilitating large language models to master 16000+ real-world apis. arXiv preprint arXiv:2307.16789, 2023.
  19. Learning transferable visual models from natural language supervision. In International conference on machine learning (ICML), 2021.
  20. Hierarchical text-conditional image generation with clip latents. arXiv preprint arXiv:2204.06125, 2022.
  21. High-resolution image synthesis with latent diffusion models. In Computer Vision and Pattern Recognition (CVPR), 2022.
  22. Dreambooth: Fine tuning text-to-image diffusion models for subject-driven generation. In Computer Vision and Pattern Recognition (CVPR), 2023.
  23. Photorealistic text-to-image diffusion models with deep language understanding. In Neural Information Processing Systems (NeurIPS), 2022.
  24. Toolformer: Language models can teach themselves to use tools. In Neural Information Processing Systems (NeurIPS), 2023.
  25. Laion-5b: An open large-scale dataset for training next generation image-text models. In Neural Information Processing Systems (NeurIPS), 2022.
  26. Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347, 2017.
  27. Learning to summarize with human feedback. In Neural Information Processing Systems (NeurIPS), 2020.
  28. Toolalpaca: Generalized tool learning for language models with 3000 simulated cases. arXiv preprint arXiv:2306.05301, 2023.
  29. Llama 2: Open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288, 2023.
  30. Diverse beam search: Decoding diverse solutions from neural sequence models. arXiv preprint arXiv:1610.02424, 2016.
  31. Human preference score v2: A solid benchmark for evaluating human preferences of text-to-image synthesis. arXiv preprint arXiv:2306.09341, 2023.
  32. Imagereward: Learning and evaluating human preferences for text-to-image generation. arXiv preprint arXiv:2304.05977, 2023.
  33. Gpt4tools: Teaching large language model to use tools via self-instruction. arXiv preprint arXiv:2305.18752, 2023.
  34. Navigating text-to-image customization: From lycoris fine-tuning to model evaluation. arXiv preprint arXiv:2309.14859, 2023.
  35. Scaling autoregressive models for content-rich text-to-image generation. arXiv preprint arXiv:2206.10789, 2022.
  36. Rrhf: Rank responses to align language models with human feedback without tears. In Neural Information Processing Systems (NeurIPS), 2023.
  37. Fine-tuning language models from human preferences. arXiv preprint arXiv:1909.08593, 2019.
Citations (1)

Summary

No one has generated a summary of this paper yet.

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 1 tweet with 0 likes about this paper.