LLM Discussion: Enhancing the Creativity of Large Language Models via Discussion Framework and Role-Play
Abstract: LLMs have shown exceptional proficiency in natural language processing but often fall short of generating creative and original responses to open-ended questions. To enhance LLM creativity, our key insight is to emulate the human process of inducing collective creativity through engaging discussions with participants from diverse backgrounds and perspectives. To this end, we propose LLM Discussion, a three-phase discussion framework that facilitates vigorous and diverging idea exchanges and ensures convergence to creative answers. Moreover, we adopt a role-playing technique by assigning distinct roles to LLMs to combat the homogeneity of LLMs. We evaluate the efficacy of the proposed framework with the Alternative Uses Test, Similarities Test, Instances Test, and Scientific Creativity Test through both LLM evaluation and human study. The results show that our proposed framework outperforms single-LLM approaches and existing multi-LLM frameworks across various creativity metrics. The code is available at https://github.com/lawraa/LLM-Discussion.
- Evaluating correctness and faithfulness of instruction-following models for question answering. ArXiv, 2023.
- R Botsch. Scopes and methods of political science, 2011.
- Language models are few-shot learners. 2020.
- Maria Camacho. David kelley: From design to design thinking at stanford and ideo. She Ji: The Journal of Design, Economics, and Innovation, 2016.
- Art or artifice? large language models and the false promise of creativity. ArXiv, 2023.
- Chateval: Towards better LLM-based evaluators through multi-agent debate. In The Twelfth International Conference on Learning Representations, 2024. URL https://openreview.net/forum?id=FQepisCUWu.
- Evaluating large language models trained on code. ArXiv, 2021.
- Training verifiers to solve math word problems. ArXiv, 2021a.
- Training verifiers to solve math word problems. arXiv preprint arXiv:2110.14168, 2021b.
- A report on the 40year follow-up of the torrance tests of creative thinking: Alive and well in the new millennium. Gifted Child Quarterly - GIFTED CHILD QUART, 2005.
- Thespian: Multi-character text role-playing game agents. ArXiv, 2023.
- Enhancing chat language models by scaling high-quality instructional conversations. In Empirical Methods in Natural Language Processing, 2023.
- Automatic scoring of metaphor creativity with large language models. Creativity Research Journal, 2024.
- Improving factuality and reasoning in language models through multiagent debate. ArXiv, 2023.
- Using gpt-4 to augment unbalanced data for automatic scoring. ArXiv, abs/2310.18365, 2023. URL https://api.semanticscholar.org/CorpusID:264590461.
- A confederacy of models: a comprehensive evaluation of llms on creative writing. Findings of the Association for Computational Linguistics: EMNLP 2023, 2023.
- A confederacy of models: a comprehensive evaluation of LLMs on creative writing. In Findings of the Association for Computational Linguistics: EMNLP 2023, 2023.
- Is group work beneficial for producing creative designs in stem design education? International Journal of Technology and Design Education, 2022.
- Measuring massive multitask language understanding. ArXiv, 2020.
- Measuring massive multitask language understanding. In International Conference on Learning Representations, 2021.
- A scientific creativity test for secondary school students. International Journal of Science Education, 2002.
- Agentcoder: Multi-agent-based code generation with iterative testing and optimisation. ArXiv, 2023.
- Creative writing with an ai-powered writing assistant: Perspectives from professional writers. ArXiv, 2022.
- Llm-blender: Ensembling large language models with pairwise ranking and generative fusion. In Annual Meeting of the Association for Computational Linguistics, 2023.
- Large language models are state-of-the-art evaluators of translation quality. In Annual Conference of the European Association for Machine Translation, 2023.
- Large language models are zero-shot reasoners. In Advances in Neural Information Processing Systems, 2022.
- Shalom Lappuin. Assessing the strengths and weaknesses of large language models. Journal of Logic, Language and Information, 2024.
- Chatharuhi: Reviving anime character in reality via large language model. ArXiv, 2023a.
- Large language models understand and can be enhanced by emotional stimuli. 2023b.
- Camel: Communicative agents for ”mind” exploration of large language model society. In Neural Information Processing Systems, 2023c.
- Encouraging divergent thinking in large language models through multi-agent debate. ArXiv, 2023.
- Tinygsm: achieving ¿80% on gsm8k with small language models. ArXiv, abs/2312.09241, 2023a. URL https://api.semanticscholar.org/CorpusID:266210221.
- Dynamic llm-agent network: An llm-agent collaboration framework with agent team optimization. ArXiv, 2023b.
- Skeleton-of-thought: Large language models can do parallel decoding. In International Conference on Learning Representations, 2024.
- Gpt-4 technical report. 2023.
- Training language models to follow instructions with human feedback. In Advances in Neural Information Processing Systems, 2022a.
- Training language models to follow instructions with human feedback. In Neural Information Processing Systems, 2022b.
- Vishakh Padmakumar and He He. Does writing with language models reduce content diversity? In International Conference on Learning Representations, 2024.
- Generative agents: Interactive simulacra of human behavior. ACM Symposium on User Interface Software and Technology, 2023.
- Instruction tuning with gpt-4. ArXiv, abs/2304.03277, 2023. URL https://api.semanticscholar.org/CorpusID:257985497.
- Is chatGPT a general-purpose natural language processing task solver? In The 2023 Conference on Empirical Methods in Natural Language Processing, 2023.
- Lamp: When large language models meet personalization. ArXiv, 2023.
- Six thinking hats method for developing critical thinking skills. Journal of Educational Science and Technology, 2019.
- Role play with large language models. Nature, 2023.
- Character-llm: A trainable agent for role-playing. 2023.
- Putting gpt-3’s creativity to the (alternative uses) test. ArXiv, 2022.
- Brainstorm, then select: a generative language model improves its creativity score. In The AAAI-23 Workshop on Creative AI Across Modalities, 2023.
- Ellis Paul Torrance. Torrance Tests of Creative Thinking. Norms-Technical Manual. Research Edition. Verbal Tests Forms a and B. Figural Tests Forms a and B. Personnel Press, 1966.
- Zeroshotdataaug: Generating and augmenting training data with chatgpt. ArXiv, abs/2304.14334, 2023. URL https://api.semanticscholar.org/CorpusID:258352747.
- Modes of thinking in young children: A study of the creativity-intelligence distinction. American Psychological Association, 1965.
- Is ChatGPT a good NLG evaluator? a preliminary study. In New Frontiers in Summarization Workshop, 2023a.
- Incharacter: Evaluating personality fidelity in role-playing agents through psychological interviews. 2023b.
- Rolellm: Benchmarking, eliciting, and enhancing role-playing abilities of large language models. ArXiv, 2023c.
- Multi-party chat: Conversational agents in group settings with humans and models. ArXiv, 2023.
- Autogen: Enabling next-gen llm applications via multi-agent conversation. 2023.
- Large language models as optimizers. ArXiv, 2023a.
- Harnessing the power of llms in practice: A survey on chatgpt and beyond. ACM Transactions on Knowledge Discovery from Data, 2023b.
- Wordcraft: Story writing with large language models. In 27th International Conference on Intelligent User Interfaces, 2022.
- Sentiment analysis in the era of large language models: A reality check. ArXiv, 2023.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.