Is Temperature the Creativity Parameter of Large Language Models?

Published 1 May 2024 in cs.CL and cs.AI | (2405.00492v1)

Abstract: LLMs are applied to all sorts of creative tasks, and their outputs vary from beautiful, to peculiar, to pastiche, into plain plagiarism. The temperature parameter of an LLM regulates the amount of randomness, leading to more diverse outputs; therefore, it is often claimed to be the creativity parameter. Here, we investigate this claim using a narrative generation task with a predetermined fixed context, model and prompt. Specifically, we present an empirical analysis of the LLM output for different temperature values using four necessary conditions for creativity in narrative generation: novelty, typicality, cohesion, and coherence. We find that temperature is weakly correlated with novelty, and unsurprisingly, moderately correlated with incoherence, but there is no relationship with either cohesion or typicality. However, the influence of temperature on creativity is far more nuanced and weak than suggested by the "creativity parameter" claim; overall results suggest that the LLM generates slightly more novel outputs as temperatures get higher. Finally, we discuss ideas to allow more controlled LLM creativity, rather than relying on chance via changing the temperature parameter.

Abstract PDF HTML Upgrade to Chat

References (54)

Citations (22)

View on Semantic Scholar

Summary

The paper empirically investigates if temperature controls LLM creativity, finding it weakly correlates with novelty but negatively impacts coherence.
Results indicate temperature has no significant relationship with typicality or cohesion, challenging its designation as a simple creativity parameter.
Implications suggest exploring advanced decoding strategies, developing creativity benchmarks, and leveraging prompt engineering for better creative control.

Analyzing Temperature's Effect on Creativity in LLMs

The paper "Is Temperature the Creativity Parameter of LLMs?" investigates the widely held notion that the temperature parameter in LLMs controls their creativity. The authors, Max Peeperkorn, Tom Kouwenhoven, Dan Brown, and Anna Jordanous, engage in an empirical examination of this claim by evaluating LLM-generated narratives across different temperature settings, specifically focusing on four creativity conditions: novelty, typicality, cohesion, and coherence. This research is particularly relevant as LLMs like ChatGPT have become increasingly integrated into creative domains, sparking a need for a deeper understanding of their generative capabilities.

Temperature and Creativity in LLMs

Temperature is a hyperparameter governing the randomness in LLMs' output generation, effectively balancing probabilities for word candidate selection. Higher temperatures lead to increased randomness and diversity, ostensibly enhancing creativity, while lower temperatures result in more deterministic outputs. However, this paper challenges the oversimplification of temperature as the "creativity parameter."

Methodology

To measure the influence of temperature on creativity, the researchers employed the Llama 2-Chat 70B model to generate narratives from a fixed prompt across varying temperature settings. They established a baseline—termed the "exemplar object"—by setting the temperature to near zero, resulting in a deterministic output serving as a reference point. The authors assessed the stories using computational metrics like semantic similarity and edit distance and conducted a human evaluation to provide insights into perceived creativity.

Key Findings

Weak Correlation with Novelty: Temperature showed a weak positive correlation with narrative novelty, suggesting that higher temperatures can facilitate some degree of novel output. This indicates a limited exploratory potential within the LLM's probabilistic LLM.
Negative Impact on Coherence: A moderate negative correlation was observed between temperature and coherence, highlighting a trade-off where increased novelty at higher temperatures leads to decreased coherence.
Lack of Relationship with Typicality and Cohesion: Notably, temperature exhibited no significant relationship with the typicality or cohesion of the generated content, undermining its designation as a straightforward creativity parameter.

These findings underscore the nuanced role of temperature in modulating creativity-related attributes in LLM outputs. While high temperature might diversify outputs, contributing to novelty, it compromises coherence—a pivotal aspect of storytelling quality.

Implications and Future Directions

The research offers several practical implications and pathways for future exploration:

Advanced Decoding Strategies: Designing more sophisticated decoding strategies might provide better quality creative outputs than merely adjusting the temperature.
Creativity Benchmarks: Developing standardized benchmarks to evaluate creativity in LLMs rigorously is crucial for drawing more substantial conclusions.
Prompt Engineering: Investigating how implicit knowledge within LLMs can be leveraged through advanced prompt engineering could offer greater control over creative outputs.

Conclusion

The paper contributes significantly to the understanding of LLMs' creative potential by dissecting the influence of temperature on various creativity dimensions. It invites a reevaluation of conventional beliefs and encourages the development of refined methodologies and tools to fully harness the creative power of AI. This research stands as a testament to the complexity inherent in computational creativity, advocating for a more holistic approach to unlocking it in LLMs.

Markdown Report Issue