HumT DumT: Measuring and controlling human-like language in LLMs

Published 18 Feb 2025 in cs.CL, cs.AI, and cs.CY | (2502.13259v2)

Abstract: Should LLMs generate language that makes them seem human? Human-like language might improve user experience, but might also lead to deception, overreliance, and stereotyping. Assessing these potential impacts requires a systematic way to measure human-like tone in LLM outputs. We introduce HumT and SocioT, metrics for human-like tone and other dimensions of social perceptions in text data based on relative probabilities from an LLM. By measuring HumT across preference and usage datasets, we find that users prefer less human-like outputs from LLMs in many contexts. HumT also offers insights into the perceptions and impacts of anthropomorphism: human-like LLM outputs are highly correlated with warmth, social closeness, femininity, and low status, which are closely linked to the aforementioned harms. We introduce DumT, a method using HumT to systematically control and reduce the degree of human-like tone while preserving model performance. DumT offers a practical approach for mitigating risks associated with anthropomorphic language generation.

Abstract PDF Upgrade to Chat

Summary

The paper introduces a novel metric, HumT, to quantify human-like tone in LLM outputs by comparing animate versus inanimate phrasing over multiple runs.
It integrates SocioT metrics to capture dimensions like warmth, status, and gender, revealing that high human-likeness correlates with subservient and stereotyped language.
The proposed DumT method fine-tunes LLMs to generate less anthropomorphic text while maintaining core task performance, validated by user preferences and human annotations.

Measuring and Controlling Human-Like Language in LLMs

Introduction

The anthropomorphization of LLMs—the extent to which generated language mimics human communicative tone and persona—is a pivotal topic in AI alignment and deployment. While human-like language may facilitate engagement and improved user experience, it can also produce adverse social effects such as overreliance, unwarranted trust, stereotype perpetuation, and the risk of users misunderstanding LLM agency and capabilities. This paper introduces a novel metric, HumT, for the quantification of "human-like tone" in text generated by LLMs, augments it with the SocioT family of metrics for capturing related social perceptions, and proposes DumT, a direct preference optimization approach to steer model generations away from anthropomorphic tendencies while maintaining task performance.

Methodology: HumT and SocioT Metric Construction

HumT quantifies human-like tone by operationalizing the probability that a text is uttered by a specific individual (animate agent, "he/she said") versus by an inanimate entity ("it said"). Given a string $s$ , HumT computes the log likelihood ratio $T_D(s)$ of occurrences in contexts constructed by prepending animate or inanimate speakers, evaluated with an autoregressive LLM (GPT-2). To reduce variance, the likelihoods are averaged over $n=100$ runs per variant, using fixed-length substrings to mitigate tokenization artifacts. The result reflects whether the model, trained on broad textual corpora, implicitly frames the utterance as congruent with human agency.

SocioT generalizes this approach to other socially salient dimensions: warmth, status, social distance, and gender. Each is operationalized via semantically matched pairs of phrase sets, e.g., "the [friend, lover, mentor, idol] said" vs "the [stranger, enemy, dictator, examiner] said" for warmth. Correlations computed between HumT and SocioT across large preference and usage datasets provide insight into the co-occurrence of human-like tone with implicit gendering, power, and social intimacy.

Validation: Construct validity is established through human annotation studies, demonstrating that HumT and SocioT scores align with expert judgments on human-likeness and social perceptions across diverse corpora.

Empirical Findings

Quantitative Assessment of Human-Like Tone

Human vs LLM vs Web Data: LLM generations are less anthropomorphic than human-authored responses on matched prompts but more so than generic web text, indicating that the elevated HumT in LLMs is primarily a product of instruction tuning and RLHF steps rather than pretraining alone.
Linguistic Correlates: High HumT is reliably associated with features such as first-person pronoun use, conversational markers, affect, and lower analytical language—attributes identified as hallmarks of human-written dialogue and personal discourse by previous linguistic studies.

Correlations with Social Dimensions: Across all tested LLM datasets, HumT exhibits strong positive correlation with social closeness ( $r=0.87$ ), warmth ( $r=0.45$ ), and femininity ( $r=0.47$ ), and a strong negative correlation with status ( $r=-0.80$ ).
Implications: LLM outputs that are more human-like are also more likely to reflect subservient, low-status, and feminized social language, supporting sociolinguistic concerns regarding stereotype reinforcement and anthropomorphization. This mirrors current designs where LLMs are tuned for warmth and approachability, often via narratives aligned with traditionally marginalized or subordinate roles.

User Preferences and Information Density

Strong Empirical Result: Contrary to industry practice, diverse user preference datasets (including RLHF, real-world usage, and participatory alignment studies) show that outputs with lower HumT are significantly preferred ( $p < 0.05$ ) by annotators and end-users. Preferred responses are less verbose, more information-dense, more authentic to LLMs’ capabilities, less misleading, and minimize empty pleasantries or deceptive social cues regarding agency and empathy.
Domain and Demographic Effects: The preference for lower HumT is robust across demographics, but the magnitude varies by prompt topic (greater for high-value/controversial topics; weaker for greetings). No evidence was found that user age, gender, or familiarity with LLMs consistently modifies this preference.

DumT: Direct Optimization for Lower Human-Like Tone

Given the empirical findings above, the authors propose DumT: a method that fine-tunes LLMs to generate less human-like output by combining HumT-based pairs as reward signal instances within direct preference optimization (DPO). The preference dataset is constructed from output pairs where the less anthropomorphic completion is also preferred by users.

Training Regime: DumT is instantiated on Meta-Llama-3-8B-Instruct, with DPO fine-tuning on $\sim$ 500 preference pairs (HumT-differentiated and user-preferred), with standard hyperparameters.
Performance: DumT achieves a statistically significant reduction in mean HumT relative to both the base and a randomly DPO-fine-tuned baseline ( $p < 0.001$ ).
Model Robustness: On RewardBench and human-annotated preference tasks, DumT matches or outperforms the baseline on core instruction-following metrics. When performance drops, it is restricted to domains (e.g., chat-focused datasets) where higher anthropomorphism is implicitly rewarded in the gold labels, reflecting a tension between standard benchmarking and user-aligned expectations.
Qualitative Analysis: Human annotators prefer DumT outputs for their conciseness, accuracy, and avoidance of misleading anthropomorphic claims. Annotators highlight discomfort with LLM phatic expressions ("I'm sorry", "I can imagine") that misrepresent the underlying computational nature of the agent.

Theoretical and Practical Implications

Alignment, Stereotype Reification, and Sociotechnical Risks

This work offers a formal, scalable mechanism for quantifying and controlling human-likeness—a previously nebulous sociolinguistic construct—in LLM outputs. By demonstrating the association between anthropomorphic tone, gendered/status-related language, and user dispreference, the findings challenge prevailing industry assumptions that aim for assistant-like or "natural" language as a default optimum.

Notably, the propagation of submissive and feminine tones reinforces existing power hierarchies in ways that are opaque to model developers and users. As LLMs are deployed in increasingly intimate, high-trust applications (e.g., mental health, social support, legal or financial advice), unchecked anthropomorphism could lead to insincerity, overreliance, and societal harms, especially for marginalized groups.

Benchmarking and Model Evaluation

This work also questions the design of popular benchmarks that implicitly valorize human-likeness (e.g., through preference or "engagement" tasks) and suggests that alignment objectives must be reconsidered for different deployment domains. Overreliance on aggregate preference signals masked by high HumT may misalign models with actual user needs for informativeness and transparency.

Future Directions

The HumT/SocioT framework enables further work in controllable text generation, dynamic persona steering, and the design of context-sensitive alignment criteria (see also [weng2024controllmcraftingdiversepersonalities], [li2024steerabilitylargelanguagemodels], [wang2024arithmeticcontrolllmsdiverse]). It provides an infrastructure for comparative studies on anthropomorphism across cultures, modalities, and interaction settings (including non-English and multimodal models). Further, the codebase and metrics offer a toolkit for multi-turn evaluation and human-subject experiments, informing future social and regulatory considerations.

Limitations

The analysis focuses on English and GPT-2-based measures of human-like tone. The definitions of anthropomorphism, femininity, warmth, and status are culturally and contextually situated, and model biases may differ substantially across languages, domains, and pretraining data. Also, HumT focuses on tone rather than fuller dimensions of “human-likeness,” such as embodiment, reasoning, or consciousness.

Conclusion

HumT, SocioT, and DumT provide a rigorous measurement and intervention framework for examining and mitigating human-like (anthropomorphic) tone in LLM outputs. The empirical evidence demonstrates that such anthropomorphic language is strongly associated with harmful social stereotypes and systematically dispreferred by users except in narrow, context-dependent scenarios. DumT enables model developers to optimize for output informativeness and authenticity, challenging the prevailing assumption that anthropomorphic dialog is universally desirable in user-facing language agents. These results call for a reevaluation of alignment, benchmarking, and deployment strategies to ensure LLMs align with both user needs and broader social good.

Markdown Report Issue