Quality-Diversity through AI Feedback
Abstract: In many text-generation problems, users may prefer not only a single response, but a diverse range of high-quality outputs from which to choose. Quality-diversity (QD) search algorithms aim at such outcomes, by continually improving and diversifying a population of candidates. However, the applicability of QD to qualitative domains, like creative writing, has been limited by the difficulty of algorithmically specifying measures of quality and diversity. Interestingly, recent developments in LMs have enabled guiding search through AI feedback, wherein LMs are prompted in natural language to evaluate qualitative aspects of text. Leveraging this development, we introduce Quality-Diversity through AI Feedback (QDAIF), wherein an evolutionary algorithm applies LMs to both generate variation and evaluate the quality and diversity of candidate text. When assessed on creative writing domains, QDAIF covers more of a specified search space with high-quality samples than do non-QD controls. Further, human evaluation of QDAIF-generated creative texts validates reasonable agreement between AI and human evaluation. Our results thus highlight the potential of AI feedback to guide open-ended search for creative and original solutions, providing a recipe that seemingly generalizes to many domains and modalities. In this way, QDAIF is a step towards AI systems that can independently search, diversify, evaluate, and improve, which are among the core skills underlying human society's capacity for innovation.
- Do as I can, not as I say: Grounding language in robotic affordances. arXiv preprint arXiv:2204.01691, 2022.
- Flamingo: A visual language model for few-shot learning. Advances in Neural Information Processing Systems, 35:23716–23736, 2022.
- Likert scales and data analyses. Quality progress, 40(7):64–65, 2007.
- Constitutional AI: Harmlessness from AI feedback. arXiv preprint arXiv:2212.08073, 2022.
- Multifusion: Fusing pre-trained models for multi-lingual, multi-modal image generation. arXiv preprint arXiv:2305.15296, 2023.
- Deep surrogate assisted generation of environments. Advances in Neural Information Processing Systems, 35:37762–37777, 2022.
- Surrogate assisted generation of human-robot interaction scenarios. arXiv preprint arXiv:2304.13787, 2023.
- On the opportunities and risks of foundation models. arXiv preprint arXiv:2108.07258, 2021.
- OpenELM, January 2023a. URL https://github.com/CarperAI/OpenELM.
- Diff models - A new way to edit code. CarperAI Blog, Jan 2023b. URL https://carper.ai/diff-model/.
- Minimal criterion coevolution: A new approach to open-ended search. In Proceedings of the Genetic and Evolutionary Computation Conference, pp. 67–74, 2017.
- Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901, 2020.
- Evoprompting: Language models for code-level neural architecture search. arXiv preprint arXiv:2302.14838, 2023.
- Evaluating large language models trained on code. 2021.
- PaLM: Scaling language modeling with pathways. arXiv preprint arXiv:2204.02311, 2022.
- Scaling instruction-finetuned language models. arXiv preprint arXiv:2210.11416, 2022.
- Jeff Clune. AI-GAs: AI-generating algorithms, an alternate paradigm for producing general artificial intelligence. arXiv preprint arXiv:1905.10985, 2019.
- Augmenting autotelic agents with large language models. arXiv preprint arXiv:2305.12487, 2023.
- Antoine Cully. Autonomous skill discovery with quality-diversity and unsupervised descriptors. In Proceedings of the Genetic and Evolutionary Computation Conference, pp. 81–89, 2019.
- Hierarchical behavioral repertoires with unsupervised descriptors. Proceedings of the Genetic and Evolutionary Computation Conference, 2018.
- Robots that can adapt like animals. Nature, 521(7553):503–507, 2015.
- Quality diversity through human feedback. arXiv preprint arXiv:2310.12103, 2023.
- PaLM-E: An embodied multimodal language model. arXiv preprint arXiv:2303.03378, 2023.
- Open questions in creating safe open-ended ai: tensions between control and creativity. In Artificial Life Conference Proceedings 32, pp. 27–35. MIT Press One Rogers Street, Cambridge, MA 02142-1209, USA journals-info …, 2020.
- MAGMA - Multimodal augmentation of generative models through adapter-based finetuning. arXiv preprint arXiv:2112.05253, 2021.
- Fast and stable map-elites in noisy domains using deep grids. In Artificial Life Conference Proceedings 32, pp. 273–282. MIT Press One Rogers Street, Cambridge, MA 02142-1209, USA journals-info …, 2020.
- Uncertain quality-diversity: Evaluation methodology and new methods for quality-diversity in uncertain domains. IEEE Transactions on Evolutionary Computation, 2023.
- Differentiable quality diversity. Advances in Neural Information Processing Systems, 34:10040–10052, 2021.
- Fair diffusion: Instructing text-to-image generation models on fairness. arXiv preprint arXiv:2302.10893, 2023.
- Adam Gaier. Accelerating Evolutionary Design Exploration with Predictive and Generative Models. PhD thesis, Université de Lorraine, 2020.
- Data-efficient exploration, optimization, and modeling of diverse designs through surrogate-assisted illumination. In Proceedings of the Genetic and Evolutionary Computation Conference, pp. 99–106, 2017.
- Are quality diversity algorithms better at generating stepping stones than objective-based search? In Proceedings of the Genetic and Evolutionary Computation Conference Companion, pp. 115–116, 2019.
- Unsupervised behaviour discovery with quality-diversity optimisation. arXiv preprint arXiv:2106.05648, 2021.
- Towards a unified view of parameter-efficient transfer learning. arXiv preprint arXiv:2110.04366, 2021.
- Parameter-efficient transfer learning for NLP. In International Conference on Machine Learning, pp. 2790–2799. PMLR, 2019.
- How can we know when language models know? On the calibration of language models for question answering. Transactions of the Association for Computational Linguistics, 9:962–977, 2021.
- Scaling laws for neural language models. arXiv preprint arXiv:2001.08361, 2020.
- A style-based generator architecture for generative adversarial networks. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 4401–4410, 2019.
- Model-based quality-diversity search for efficient robot learning. In 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 9675–9680. IEEE, 2020.
- Understanding the effects of rlhf on llm generalisation and diversity. arXiv preprint arXiv:2310.06452, 2023.
- RLAIF: Scaling reinforcement learning from human feedback with AI feedback. arXiv preprint arXiv:2309.00267, 2023.
- Revising the evolutionary computation abstraction: Minimal criteria novelty search. In Proceedings of the 12th annual conference on Genetic and evolutionary computation, pp. 103–110, 2010.
- Abandoning objectives: Evolution through the search for novelty alone. Evolutionary computation, 19(2):189–223, 2011a.
- Evolving a diversity of virtual creatures through novelty search and local competition. In Proceedings of the 13th annual conference on Genetic and evolutionary computation, pp. 211–218, 2011b.
- Exploiting open-endedness to solve problems through the search for novelty. In ALIFE, pp. 329–336, 2008.
- On the critical role of divergent selection in evolvability. Frontiers in Robotics and AI, 3:45, 2016.
- The surprising creativity of digital evolution: A collection of anecdotes from the evolutionary computation and artificial life research communities, 2019.
- Evolution through large models. arXiv preprint arXiv:2206.08896, 2022.
- Dynamics-aware quality-diversity for efficient learning of skill repertoires. arXiv preprint arXiv:2109.08522, 2021.
- Learning to walk autonomously via reset-free quality-diversity. arXiv preprint arXiv:2204.03655, 2022.
- Chin-Yew Lin. Rouge: A package for automatic evaluation of summaries. In Text summarization branches out, pp. 74–81, 2004.
- Visual instruction tuning. arXiv preprint arXiv:2304.08485, 2023.
- Fantastically ordered prompts and where to find them: Overcoming few-shot prompt order sensitivity. arXiv preprint arXiv:2104.08786, 2021.
- Self-refine: Iterative refinement with self-feedback. arXiv preprint arXiv:2303.17651, 2023.
- Christopher D Manning. An introduction to information retrieval. Cambridge university press, 2009.
- Language model crossover: Variation through few-shot prompting. arXiv preprint arXiv:2302.12170, 2023.
- Illuminating search spaces by mapping elites. arXiv preprint arXiv:1504.04909, 2015.
- Niklas Muennighoff. SGPT: GPT sentence embeddings for semantic search. arXiv:2202.08904, 2022.
- Deep neural networks are easily fooled: High confidence predictions for unrecognizable images. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 427–436, 2015a.
- Innovation engines: Automated creativity and improved stochastic optimization via deep learning. In Proceedings of the Genetic and Evolutionary Computation Conference, 2015b.
- Understanding innovation engines: Automated creativity and improved stochastic optimization via deep learning. Evolutionary computation, 24(3):545–572, 2016.
- Adversarial NLI: A new benchmark for natural language understanding. arXiv preprint arXiv:1910.14599, 2019.
- Policy gradient assisted MAP-Elites. In Proceedings of the Genetic and Evolutionary Computation Conference, pp. 866–875, 2021.
- OpenAI. GPT-4 technical report, 2023.
- Training language models to follow instructions with human feedback. Advances in Neural Information Processing Systems, 35:27730–27744, 2022.
- Discovering language model behaviors with model-written evaluations. arXiv preprint arXiv:2212.09251, 2022.
- Aces: generating diverse programming puzzles with autotelic language models and semantic descriptors. arXiv preprint arXiv:2310.10692, 2023.
- Quality diversity: A new frontier for evolutionary computation. Frontiers in Robotics and AI, 3:40, 2016.
- Learning transferable visual models from natural language supervision. In International conference on machine learning, pp. 8748–8763. PMLR, 2021.
- Sentence-BERT: Sentence embeddings using siamese BERT-networks. arXiv preprint arXiv:1908.10084, 2019.
- Can LLMs generate random numbers? evaluating LLM sampling in controlled domains. In ICML 2023 Workshop: Sampling and Optimization in Discrete Space, 2023. URL http://people.csail.mit.edu/renda/llm-sampling-paper.
- Multitask prompted training enables zero-shot task generalization. arXiv preprint arXiv:2110.08207, 2021.
- Self-critiquing models for assisting human evaluators. arXiv preprint arXiv:2206.05802, 2022.
- Picbreeder: A case study in collaborative evolutionary exploration of design space. Evol. Comput., 19(3):373–403, 2011. doi: 10.1162/EVCO_a_00030. URL https://doi.org/10.1162/EVCO_a_00030.
- Reflexion: An autonomous agent with dynamic memory and self-reflection. arXiv preprint arXiv:2303.11366, 2023.
- Defining and characterizing reward gaming. Advances in Neural Information Processing Systems, 35:9460–9471, 2022.
- Recursive deep models for semantic compositionality over a sentiment treebank. In Proceedings of the 2013 conference on empirical methods in natural language processing, pp. 1631–1642, 2013.
- Kenneth O Stanley. Compositional pattern producing networks: A novel abstraction of development. Genetic programming and evolvable machines, 8:131–162, 2007.
- Why greatness cannot be planned: The myth of the objective. Springer, 2015.
- Open-endedness: The last grand challenge you’ve never heard of. O’Reilly Radar, 2017.
- Roformer: Enhanced transformer with rotary position embedding. arXiv preprint arXiv:2104.09864, 2021.
- Mariogpt: Open-ended text2level generation through large language models, 2023.
- Level generation through large language models. In Proceedings of the 18th International Conference on the Foundations of Digital Games, pp. 1–8, 2023.
- Scaling up MAP-Elites using centroidal Voronoi tessellations. arXiv preprint arXiv:1610.05729, 2016.
- Voyager: An open-ended embodied agent with large language models. arXiv preprint arXiv:2305.16291, 2023a.
- Diversity from human feedback. arXiv preprint arXiv:2310.06648, 2023b.
- Self-instruct: Aligning language model with self generated instructions. arXiv preprint arXiv:2212.10560, 2022a.
- Super-NaturalInstructions: Generalization via declarative instructions on 1600+ NLP tasks. arXiv preprint arXiv:2204.07705, 2022b.
- Finetuned language models are zero-shot learners. arXiv preprint arXiv:2109.01652, 2021.
- OMNI: Open-endedness via models of human notions of interestingness. arXiv preprint arXiv:2306.01711, 2023.
- Deep surrogate assisted MAP-Elites for automated hearthstone deckbuilding. In Proceedings of the Genetic and Evolutionary Computation Conference, pp. 158–167, 2022.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.