- The paper shows that LLMs achieved 1.5–2.8 percentage points higher donation shares and greater engagement compared to human-crafted messages.
- The experiments employed two preregistered, within-subjects studies manipulating genuine, generic, and false personalization for U.S. cancer charities.
- The findings imply that accurate personalization is key, as falsely personalized appeals significantly decrease persuasiveness, especially in LLM outputs.
Introduction
The deployment of LLMs across persuasive communication tasks has reignited discussion regarding their relative efficacy compared to human agents—especially in domains beyond misinformation, such as prosocial influence. This paper systematically evaluates whether LLMs (specifically, OpenAI’s gpt-4.5-preview) can elicit costly prosocial behavior, operationalized as charitable donations, more effectively than human writers. Further, it probes the moderating role of personalization in message effectiveness, dissecting the impact of genuinely tailored, generic, and falsely personalized appeals.
Experimental Design and Methodology
Two preregistered, quota-balanced, within-subjects factorial experiments (Study 1: N=658, Study 2: N=642) were conducted. Personalized donation appeals were generated using both human writers and LLMs, targeting six U.S.-based cancer charities. Personalization was manipulated at three levels: genuinely personalized (demographically congruent), generic, and falsely personalized (deliberate demographic mismatch). Study 1 utilized academically trained human authors, whereas Study 2 leveraged incentivized lay writers, with an explicit focus on within-demographic targeting. Performance was assessed through actual bonus donation allocation, binary engagement responses, and composite ratings of persuasiveness.
Key Findings
Across both studies, LLM-generated appeals statistically significantly outperformed human-authored messages in all primary outcome variables:
- Donation Share: LLM messages received 1.5–2.8 percentage points higher share of bonus allocation relative to human-generated (e.g., Study 2: LLM M=13.3%, Human M=10.5%, p=.004, d=.12).
- Engagement: LLM content consistently elicited more user engagement (Study 2: LLM M=0.60–0.68, Human M=0.34–0.55).
- Perceived Persuasiveness: LLM appeals received higher persuasiveness scores than human-written across all conditions.
These findings were robust to the skill, demographic representativeness, and motivational incentives of the human writers. The effect sizes observed, while modest, were consistent and statistically reliable.
Personalization Effects: Benefits and Risks
The incremental efficacy of personalization was context-dependent:
- In Study 2 (demographic match between writers and recipients), genuine personalization outperformed generic and falsely personalized appeals regarding donation share and engagement.
- False personalization—misalignment between message and recipient demographic—reliably decreased both donation behavior and perceived persuasiveness, often performing worse than purely generic messaging. This penalty was more pronounced for LLM-generated content.
- In conditions lacking demographic congruence between writer and recipient (Study 1), personalization provided no advantage over generic content.
Notably, for LLMs, the difference in effect between personalized and generic appeals was negligible, whereas falsely personalized content was penalized—indicating that while LLMs can scale personalization, the underlying data accuracy and inferential alignment are crucial.
Theoretical and Practical Implications
Mechanisms of LLM Superiority
Several candidate mechanisms explain the LLM advantage:
- Textual Quality: Higher fluency, coherence, and stylistic consistency in LLM outputs could increase perceived credibility and trust.
- Optimized Rhetorical Strategies: LLMs, trained on large corpora, may deploy diverse, empirically effective rhetorical and emotional framings with greater frequency.
- Affective and Moral Framing: LLMs utilize morally charged language and emotional appeals in accordance with effective prosocial persuasion strategies.
However, as the studies did not decompose these effects experimentally, future work should apply causal mediation analysis or natural language feature attribution (e.g., psycholinguistic annotation, feature ablation) to isolate primary drivers.
Limitations
- Cultural and Linguistic Constraints: All experiments were U.S.-centric; LLM advantage may attenuate in low-resource languages or cultural contexts outside the LLM’s training distribution, as suggested by evidence of WEIRD bias in LLM behavior modeling.
- Disclosure Effects: The effect sizes assume recipient naïveté regarding message authorship; recent evidence indicates LLM advantages might be reversed with explicit source disclosure.
- Unidirectional Communication: The study tested one-shot persuasion in a social media context. Real-world fundraising can involve iterative, adaptive dialog, potentially amplifying or mitigating the LLM advantage.
Broader Impacts and Future Directions
The results forecast a meaningful shift in the economics of persuasive messaging: LLMs now offer robust, scalable alternatives to costly human labor for prosocial persuasion, such as fundraising. For actual deployment, accuracy of personalization inputs is crucial—incorrectly personalized messages are counterproductive.
Key future research directions include:
- Systematic evaluation of LLM performance across diverse cultures, languages, and socio-cognitive segments.
- Longitudinal field experiments with revealed-preference (actual) donation outcomes.
- Direct investigation into the interaction between disclosure, perceived credibility, and persuasive efficacy in LLM-human comparisons.
Conclusion
This study demonstrates that LLMs not only match but statistically significantly surpass skilled humans in generating social media appeals that mobilize costly prosocial giving. The practical deployment of LLMs in fundraising and other prosocial domains should foreground the importance of accurate data for effective personalization, as false targeting can actively undermine persuasive intent. Theoretical advances will require dissecting the psycholinguistic and inferential mechanisms underlying LLM superiority and adapting to socioculturally diverse contexts. These findings are foundational for AI-driven behavioral intervention design and the responsible integration of LLMs in prosocial digital communication.
Reference:
"Prosocial Persuasion at Scale? LLMs Outperform Humans in Donation Appeals Across Levels of Personalization" (2604.03202)