Prosocial Persuasion at Scale? Large Language Models Outperform Humans in Donation Appeals Across Levels of Personalization

Published 3 Apr 2026 in cs.CY | (2604.03202v1)

Abstract: LLMs are increasingly regarded as having the potential to generate persuasive content at scale. While previous studies have focused on the risks associated with LLM-generated misinformation, the role of LLMs in enabling prosocial persuasion is still underexplored. We investigate whether donation appeals authored by LLMs are as effective as those written by humans across degrees of personalization. Two preregistered online experiments (Study 1: N = 658; Study 2: N = 642) manipulated Personalization (generic vs. personalized vs. falsely personalized) and Content source (human vs. LLM) and presented participants with donation appeals for charities. We assessed how participants distributed their bonus money across the charities, how they engaged with the donation appeals, and how persuasive they found them. In both experiments, LLM-generated content yielded more donations, resulted in higher engagement, and was rated as more persuasive than human-authored content. There was a gain associated with personalization (Study 2) and a penalty for false personalization (Study 1). Our results suggest that LLMs may be a suitable technology for generating content that can encourage prosocial behavior.

Abstract PDF Upgrade to Chat

Authors (3)

Summary

The paper shows that LLMs achieved 1.5–2.8 percentage points higher donation shares and greater engagement compared to human-crafted messages.
The experiments employed two preregistered, within-subjects studies manipulating genuine, generic, and false personalization for U.S. cancer charities.
The findings imply that accurate personalization is key, as falsely personalized appeals significantly decrease persuasiveness, especially in LLM outputs.

LLMs Surpass Human Performance in Prosocial Persuasion: Efficacy, Personalization, and Implications

Introduction

The deployment of LLMs across persuasive communication tasks has reignited discussion regarding their relative efficacy compared to human agents—especially in domains beyond misinformation, such as prosocial influence. This paper systematically evaluates whether LLMs (specifically, OpenAI’s gpt-4.5-preview) can elicit costly prosocial behavior, operationalized as charitable donations, more effectively than human writers. Further, it probes the moderating role of personalization in message effectiveness, dissecting the impact of genuinely tailored, generic, and falsely personalized appeals.

Experimental Design and Methodology

Two preregistered, quota-balanced, within-subjects factorial experiments (Study 1: N=658, Study 2: N=642) were conducted. Personalized donation appeals were generated using both human writers and LLMs, targeting six U.S.-based cancer charities. Personalization was manipulated at three levels: genuinely personalized (demographically congruent), generic, and falsely personalized (deliberate demographic mismatch). Study 1 utilized academically trained human authors, whereas Study 2 leveraged incentivized lay writers, with an explicit focus on within-demographic targeting. Performance was assessed through actual bonus donation allocation, binary engagement responses, and composite ratings of persuasiveness.

Key Findings

LLMs Consistently Outperform Humans

Across both studies, LLM-generated appeals statistically significantly outperformed human-authored messages in all primary outcome variables:

Donation Share: LLM messages received 1.5–2.8 percentage points higher share of bonus allocation relative to human-generated (e.g., Study 2: LLM M=13.3%, Human M=10.5%, $p = .004$ , $d = .12$ ).
Engagement: LLM content consistently elicited more user engagement (Study 2: LLM M=0.60–0.68, Human M=0.34–0.55).
Perceived Persuasiveness: LLM appeals received higher persuasiveness scores than human-written across all conditions.

These findings were robust to the skill, demographic representativeness, and motivational incentives of the human writers. The effect sizes observed, while modest, were consistent and statistically reliable.

Personalization Effects: Benefits and Risks

The incremental efficacy of personalization was context-dependent:

In Study 2 (demographic match between writers and recipients), genuine personalization outperformed generic and falsely personalized appeals regarding donation share and engagement.
False personalization—misalignment between message and recipient demographic—reliably decreased both donation behavior and perceived persuasiveness, often performing worse than purely generic messaging. This penalty was more pronounced for LLM-generated content.
In conditions lacking demographic congruence between writer and recipient (Study 1), personalization provided no advantage over generic content.

Notably, for LLMs, the difference in effect between personalized and generic appeals was negligible, whereas falsely personalized content was penalized—indicating that while LLMs can scale personalization, the underlying data accuracy and inferential alignment are crucial.

Theoretical and Practical Implications

Mechanisms of LLM Superiority

Several candidate mechanisms explain the LLM advantage:

Textual Quality: Higher fluency, coherence, and stylistic consistency in LLM outputs could increase perceived credibility and trust.
Optimized Rhetorical Strategies: LLMs, trained on large corpora, may deploy diverse, empirically effective rhetorical and emotional framings with greater frequency.
Affective and Moral Framing: LLMs utilize morally charged language and emotional appeals in accordance with effective prosocial persuasion strategies.

However, as the studies did not decompose these effects experimentally, future work should apply causal mediation analysis or natural language feature attribution (e.g., psycholinguistic annotation, feature ablation) to isolate primary drivers.

Limitations

Cultural and Linguistic Constraints: All experiments were U.S.-centric; LLM advantage may attenuate in low-resource languages or cultural contexts outside the LLM’s training distribution, as suggested by evidence of WEIRD bias in LLM behavior modeling.
Disclosure Effects: The effect sizes assume recipient naïveté regarding message authorship; recent evidence indicates LLM advantages might be reversed with explicit source disclosure.
Unidirectional Communication: The study tested one-shot persuasion in a social media context. Real-world fundraising can involve iterative, adaptive dialog, potentially amplifying or mitigating the LLM advantage.

Broader Impacts and Future Directions

The results forecast a meaningful shift in the economics of persuasive messaging: LLMs now offer robust, scalable alternatives to costly human labor for prosocial persuasion, such as fundraising. For actual deployment, accuracy of personalization inputs is crucial—incorrectly personalized messages are counterproductive.

Key future research directions include:

Systematic evaluation of LLM performance across diverse cultures, languages, and socio-cognitive segments.
Longitudinal field experiments with revealed-preference (actual) donation outcomes.
Direct investigation into the interaction between disclosure, perceived credibility, and persuasive efficacy in LLM-human comparisons.

Conclusion

This study demonstrates that LLMs not only match but statistically significantly surpass skilled humans in generating social media appeals that mobilize costly prosocial giving. The practical deployment of LLMs in fundraising and other prosocial domains should foreground the importance of accurate data for effective personalization, as false targeting can actively undermine persuasive intent. Theoretical advances will require dissecting the psycholinguistic and inferential mechanisms underlying LLM superiority and adapting to socioculturally diverse contexts. These findings are foundational for AI-driven behavioral intervention design and the responsible integration of LLMs in prosocial digital communication.

Reference:

"Prosocial Persuasion at Scale? LLMs Outperform Humans in Donation Appeals Across Levels of Personalization" (2604.03202)

Markdown Report Issue