Persuade Me if You Can: A Framework for Evaluating Persuasion Effectiveness and Susceptibility Among Large Language Models

Published 3 Mar 2025 in cs.CL, cs.AI, cs.LG, and cs.MA | (2503.01829v2)

Abstract: LLMs demonstrate persuasive capabilities that rival human-level persuasion. While these capabilities can be used for social good, they also present risks of potential misuse. Moreover, LLMs' susceptibility to persuasion raises concerns about alignment with ethical principles. To study these dynamics, we introduce Persuade Me If You Can (PMIYC), an automated framework for evaluating persuasion through multi-agent interactions. Here, Persuader agents engage in multi-turn conversations with the Persuadee agents, allowing us to measure LLMs' persuasive effectiveness and their susceptibility to persuasion. We conduct comprehensive evaluations across diverse LLMs, ensuring each model is assessed against others in both subjective and misinformation contexts. We validate the efficacy of our framework through human evaluations and show alignment with prior work. PMIYC offers a scalable alternative to human annotation for studying persuasion in LLMs. Through PMIYC, we find that Llama-3.3-70B and GPT-4o exhibit similar persuasive effectiveness, outperforming Claude 3 Haiku by 30%. However, GPT-4o demonstrates over 50% greater resistance to persuasion for misinformation compared to Llama-3.3-70B. These findings provide empirical insights into the persuasive dynamics of LLMs and contribute to the development of safer AI systems.

Abstract PDF Upgrade to Chat

Summary

The paper introduces 'Persuade Me If You Can' (PMIYC), an automated framework for evaluating LLM persuasion effectiveness and susceptibility through multi-agent dialogues.
Larger models like GPT-4o and Llama-3.3-70B demonstrate higher persuasive effectiveness, with GPT-4o showing greater resistance to misinformation, and multi-turn interactions typically enhance persuasion.
The PMIYC framework indicates susceptibility varies significantly by claim type, particularly misinformation resistance in robust models, and its automated assessments align well with human evaluations.

Analyzing Persuasion Dynamics in LLMs

The paper "Persuade Me if You Can: A Framework for Evaluating Persuasion Effectiveness and Susceptibility Among LLMs" presents an automated framework, called "Persuade Me If You Can" (PMIYC), devised to evaluate the persuasive capabilities and susceptibilities of LLMs through multi-agent interactions.

PMIYC is designed to address two critical facets of AI persuasion: the ability of an LLM to influence (persuasive effectiveness) and its vulnerability to being convinced (susceptibility to persuasion). The framework facilitates this analysis by simulating dialogues between two roles: a "Persuader" that attempts to sway opinions and a "Persuadee" that responds to these attempts.

The PMIYC framework involves a structured conversation setup where models engage in multi-turn dialogues on both subjective claims and misinformation. These setups are intended to provide insights into how varying interaction turns and content domains impact the effectiveness and susceptibility metrics of LLMs. The research emphasizes the dynamics present in these simulated interactions, revealing how the Persuader presents arguments, and how the Persuadee models track and shift their stance on an issue throughout the dialogue.

Key Findings:

Effectiveness and Susceptibility: The results indicate that larger models, such as Llama-3.3-70B and GPT-4o, showcase substantial persuasive abilities, with GPT-4o being more resistant to misinformation contexts compared to Llama-3.3-70B. This contrasts with smaller models like Claude 3 Haiku, which showed 30% less persuasive effectiveness compared to Llama and GPT-4o when conveying misinformation.
Impact of Multi-Turn Interactions: Multi-turn conversations are shown to generally enhance persuasion effectiveness over single-turn interactions. The first two persuasive attempts appeared to hold the most significant influence, suggesting a keen period during which LLMs are most susceptible to influence or can deliver more compelling arguments.
Domain Variability: The persuasion capability shows consistent strength across various contexts, yet the susceptibility factor varies significantly with different claim types. For instance, persuasive attempts with misinformation exhibited lower overall susceptibility in more robust models, including GPT-4o, which had a 50% stronger resistance compared to others.
Reliability of Automated Assessment: PMIYC was validated against human evaluations, showing a strong alignment with human assessments of persuasiveness and susceptibility, positioning it as a viable alternative to labor-intensive human evaluations.

Implications and Further Developments:

The implementation of PMIYC brings significant implications for future development and safety assessments of LLMs. By providing a scalable and automated mechanism for evaluating persuasive dynamics, PMIYC contributes crucial insights relevant to ensuring the alignment of LLMs with ethical guidelines and effective resistance to harmful or misleading influences.

The paper suggests that future research could extend PMIYC to a broader array of scenarios beyond subjective and misinformation domains, including contexts where LLMs are designed to foster positive behavioral changes in users. Moreover, understanding the strategies behind effective persuasion and modeling control over an LLM's susceptibility to persuasion might elevate the ethical deployment of AI-mediated persuasive technologies.

In conclusion, as LLMs become more ingrained in everyday digital interactions, the development and evaluation through frameworks like PMIYC are essential for the ethical stewardship and advancement of AI technologies, ensuring they uphold integrity while wielding their persuasive potential responsibly.

Markdown