As Generative Models Improve, People Adapt Their Prompts

Published 19 Jul 2024 in cs.HC, econ.GN, and q-fin.EC | (2407.14333v2)

Abstract: In an online experiment with N = 1893 participants, we collected and analyzed over 18,000 prompts and over 300,000 images to explore how the importance of prompting will change as the capabilities of generative AI models continue to improve. Each participant in our experiment was randomly and blindly assigned to use one of three text-to-image diffusion models: DALL-E 2, its more advanced successor DALL-E 3, or a version of DALL-E 3 with automatic prompt revision. Participants were then asked to write prompts to reproduce a target image as closely as possible in 10 consecutive tries. We find that task performance was higher for participants using DALL-E 3 than for those using DALL-E 2. This performance gap corresponds to a noticeable difference in the similarity of participants' images to their target images, and was caused in equal measure by: (1) the increased technical capabilities of DALL-E 3, and (2) endogenous changes in participants' prompting in response to these increased capabilities. More specifically, despite being blind to the model they were assigned, participants assigned to DALL-E 3 wrote longer prompts that were more semantically similar to each other and contained a greater number of descriptive words. Furthermore, while participants assigned to DALL-E 3 with prompt revision still outperformed those assigned to DALL-E 2, automatic prompt revision reduced the benefits of using DALL-E 3 by 58\%. Taken together, our results suggest that as models continue to progress, people will continue to adapt their prompts to take advantage of new models' capabilities.

Abstract PDF HTML Upgrade to Chat

Citations (1)

View on Semantic Scholar

Summary

The paper demonstrates that improved generative models, especially DALL-E 3, drive users to create longer, richer prompts, splitting performance gains evenly between model enhancements and adaptive prompting.
The experimental design with 1,891 participants and 18,000+ prompts rigorously compares different model variants using metrics such as CLIP cosine similarity and DreamSim.
The findings imply that ongoing user training and interface upgrades are essential to fully harness both the technical and behavioral advances in AI-generated image recreation.

Adaptive Prompting Strategies in AI-Generated Image Recreation

The paper "As Generative Models Improve, People Adapt Their Prompts" presents an intriguing exploration of how the sophistication of generative AI models influences user behavior, particularly in the context of text-to-image generation using OpenAI's DALL-E models. Utilizing an extensive dataset obtained from an online experiment involving 1,891 participants who collectively generated over 18,000 prompts, the authors investigate the dynamic interplay between model capabilities and user prompting strategies.

Experimental Design and Model Comparison

The authors conducted a randomized controlled trial to systematically compare three variants of text-to-image generative models: DALL-E 2, DALL-E 3, and DALL-E 3 with automatic prompt revision. Each participant was tasked with replicating a target image as closely as possible across ten attempts within a 25-minute window. The participants were blind to the model they were assigned, allowing for an unbiased assessment of the effects of model capability on user behavior.

Analytical Framework

Performance was assessed using CLIP embedding cosine similarity and DreamSim, metrics that capture the semantic and perceptual similarity between the generated and target images. The study revealed a significant improvement in performance for users interacting with DALL-E 3 compared to those using DALL-E 2. This enhancement was due to a combination of the model's superior technical capabilities and adaptive changes in user prompting behavior. Specifically, users assigned to DALL-E 3 generated longer and more descriptively rich prompts, exhibiting a higher degree of prompt similarity over successive attempts.

Decomposition of Improvement into Model and Prompting Effects

By replaying prompts across different models, the study decomposes the observed performance gains into direct model effects and indirect prompting effects. The decomposition showed that approximately half of the performance gains were attributable to the enhanced technical capabilities of DALL-E 3, while the other half resulted from users' behavioral adaptations in response to these capabilities. This finding underscores the bidirectional learning process: as models improve, users refine their prompts to exploit these advancements effectively.

Implications and Future Directions

The implications of this study are twofold. Practically, it highlights the necessity for continuous user training and interface improvements to maximize the benefits of advanced generative models. Theoretically, it suggests that the interaction between users and AI models is dynamic and evolving, with users playing a crucial role in realizing the potential of AI advancements.

Future research could extend this work by exploring how different user demographics adapt their prompting strategies, examining the extent to which these strategies can be generalized across different generative models, or evaluating the impact of additional training or automated prompt optimization features on user performance. Such studies could further elucidate the interplay between human ingenuity and machine intelligence, guiding the development of more intuitive and effective AI tools.

In conclusion, this paper provides valuable insights into how improvements in generative AI models influence user behavior, demonstrating that advancements in AI capabilities and human adaptations are mutually reinforcing. This dynamic interaction has significant implications for the design and deployment of future AI systems.