Believing Anthropomorphism: Examining the Role of Anthropomorphic Cues on Trust in Large Language Models

Published 9 May 2024 in cs.HC | (2405.06079v1)

Abstract: People now regularly interface with LLMs via speech and text (e.g., Bard) interfaces. However, little is known about the relationship between how users anthropomorphize an LLM system (i.e., ascribe human-like characteristics to a system) and how they trust the information the system provides. Participants (n=2,165; ranging in age from 18-90 from the United States) completed an online experiment, where they interacted with a pseudo-LLM that varied in modality (text only, speech + text) and grammatical person ("I" vs. "the system") in its responses. Results showed that the "speech + text" condition led to higher anthropomorphism of the system overall, as well as higher ratings of accuracy of the information the system provides. Additionally, the first-person pronoun ("I") led to higher information accuracy and reduced risk ratings, but only in one context. We discuss these findings for their implications for the design of responsible, human-generative AI experiences.

Abstract PDF HTML Upgrade to Chat

References (55)

Citations (3)

View on Semantic Scholar

Summary

The paper demonstrates that incorporating speech cues significantly increases anthropomorphism and enhances perceived accuracy in LLM interactions.
The study uses controlled experiments with 2,165 participants to measure the impacts of modality and grammatical person on trust and risk perception.
Key implications suggest designing LLM interfaces that use anthropomorphic elements selectively to boost engagement while mitigating misinformation risks.

Examining the Impact of Anthropomorphic Cues on LLM Trust

Introduction

The increasing prevalence of LLMs like ChatGPT and Google's Bard has transformed human-computer interaction, particularly through speech and text interfaces. This paper investigates the relationship between anthropomorphism—where users attribute human-like characteristics to machines—and trust in the accuracy and reliability of LLMs. The study systematically explores how users' perceptions of LLMs change with variations in communication mode (text only vs. text + speech) and the grammatical person employed by the model (first person "I" vs. third person "the system").

Methodology

The study recruited 2,165 participants who interacted with a computer-simulated LLM under controlled conditions (Figure 1). Participants were exposed to different modalities and grammatical persons to assess how these factors influence overall anthropomorphism and trust. Anthropomorphism was measured using a adapted version of the Godspeed Questionnaire, while trust was evaluated through trial-based assessments of perceived accuracy, risk, and validation necessity.

Figure 1: Procedure for the current study. All participants underwent a technical qualifier and comprehension assessment question.

Findings on Anthropomorphism

The results highlight a significant influence of modality on anthropomorphism. Specifically, systems that incorporated both speech and text, as opposed to text-only interactions, were perceived as more human-like (Figure 2). However, the grammatical person did not significantly alter anthropomorphism scores, challenging assumptions that the use of the first-person narratives ("I found") might increase human-like perceptions.

Figure 2: Mean anthropomorphism score (out of a total of 25) across Modality (speech + text = orange, text only = dark blue) and Grammatical Person ("I found", "the system found").

Trust and Perceived Accuracy

A key finding was that the inclusion of a Text-to-Speech (TTS) voice enhanced perceived accuracy of the information provided by the LLMs (Figure 3). Participants in the text + speech condition rated the responses as more accurate than those in the text-only condition. Intriguingly, while the grammatical nuance (first-person vs. third-person) showed limited overall impact, specific contexts like medication saw improved trust when the system used "I."

Figure 3: Mean ratings for "How accurate is the system’s response?" across question contexts.

Context-Dependent Trust and Risk Perception

While the presence of a voice increased perceived accuracy, it did not universally enhance trust across other dimensions such as perceived risk or need for validation. Notably, context mattered significantly, with health and medication topics considered riskier and more likely to be cross-verified by users (Appendix).

Discussion

The observations underscore the complexity of trust dynamics in LLM interactions. The enhanced anthropomorphism and perceived accuracy with voice indicate a potent heuristic relying on audio cues to infer trustworthiness. However, the nuanced role of first-person language suggests that while speech can generally enhance trust, linguistic subtleties may influence perceived objectivity or subjectivity in information delivered by LLMs.

Implications for user experience design include cautious implementation of speech in LLM systems, especially where accuracy is crucial. Limiting the use of anthropomorphic cues to contexts where LLM confidence is high can prevent misinformation.

Conclusion

This study contributes to our understanding of how anthropomorphic cues impact trust in LLMs, highlighting the influential role of speech over written text in perceived accuracy. These findings inform design strategies for responsible AI interaction, emphasizing nuanced use of language and delivery modes to balance enhanced user engagement with precautionary measures against unwarranted trust in automated systems.