White Men Lead, Black Women Help? Benchmarking and Mitigating Language Agency Social Biases in LLMs

Published 16 Apr 2024 in cs.CL, cs.AI, and cs.CY | (2404.10508v5)

Abstract: Social biases can manifest in language agency. However, very limited research has investigated such biases in LLM-generated content. In addition, previous works often rely on string-matching techniques to identify agentic and communal words within texts, falling short of accurately classifying language agency. We introduce the Language Agency Bias Evaluation (LABE) benchmark, which comprehensively evaluates biases in LLMs by analyzing agency levels attributed to different demographic groups in model generations. LABE tests for gender, racial, and intersectional language agency biases in LLMs on 3 text generation tasks: biographies, professor reviews, and reference letters. Using LABE, we unveil language agency social biases in 3 recent LLMs: ChatGPT, Llama3, and Mistral. We observe that: (1) LLM generations tend to demonstrate greater gender bias than human-written texts; (2) Models demonstrate remarkably higher levels of intersectional bias than the other bias aspects. (3) Prompt-based mitigation is unstable and frequently leads to bias exacerbation. Based on our observations, we propose Mitigation via Selective Rewrite (MSR), a novel bias mitigation strategy that leverages an agency classifier to identify and selectively revise parts of generated texts that demonstrate communal traits. Empirical results prove MSR to be more effective and reliable than prompt-based mitigation method, showing a promising research direction.

Abstract PDF Upgrade to Chat

Citations (1)

View on Semantic Scholar

Summary

The paper introduces a novel Language Agency Classification dataset and BERT-based classifier that accurately quantifies agency in texts.
The paper finds that LLM-generated texts exaggerate social biases, showing more pronounced disparities than human-written texts.
The paper reveals that texts about minority groups, particularly Black women, consistently exhibit reduced agency, highlighting systemic language bias.

Language plays a pivotal role in reflecting and perpetuating social biases. These biases often manifest as differences in perceived levels of agency and communal behavior in written and spoken descriptions of diverse demographic groups. Traditionally, dominant social groups are characterized more agentically, displaying traits associated with leadership and autonomy, while minority groups are often described in more communal terms, which emphasize passivity and support roles. This study presents a comprehensive examination of these biases across several text types, both human-written and generated by LLMs, applying a novel dataset and classification approach to quantify language agency at a granular level.

Methodology and Data Overview

To address the nuances of measuring agency within text, the researchers devised a Language Agency Classification (LAC) dataset, fostering the training of models capable of discerning agency in written language more accurately than prior tools. The agency classifier derived from this dataset was then utilized to analyze six varied sources of text data—biographies, professor reviews, and reference letters—highlighting disparities based on gender, race, and intersectional identities.

LAC Dataset Construction and Classifier Development

The LAC dataset, created using sentence-level annotations, was designed to capture the essence of agentic and communal language more effectively than traditional string matching techniques, which have shown substantial limitations in prior studies. Using a combination of generation by ChatGPT and human annotation adjustments, the dataset showed high reliability (with a Fleiss's Kappa score of 0.90 after refining). This dataset underpinned the development of an agency classifier, with BERT models outperforming other architectures like RoBERTa and Llama2 in agency classification tasks.

Text Data Collection and Analysis

The text samples analyzed were split between human-written sources and those synthesized by LLMs. For human-written texts, existing datasets were utilized wherever possible, while LLMs generated additional required data, especially where racial and intersectional information was crucial but absent in available resources. Each text type underwent rigorous processing to ensure that the examinations of agency were grounded in robust and representative sample data.

Key Findings on Agency Bias

The study unearthed several compelling findings:

Alignment with Social Science Insights: The biases identified in human-written texts paralleled known social biases, confirming that agentic and communal descriptions in texts align with broader societal inequality.
Exaggeration by LLMs: Texts generated by LLMs displayed more pronounced biases compared to human-authored texts, suggesting that without careful calibration, LLMs could augment existing social biases.
Pronounced Impact on Minorities: Texts about minority groups, especially those from intersectional backgrounds (e.g., Black women), consistently exhibited lower levels of agency. This pattern was consistent across data types and sources, indicating a pervasive issue in both human and machine language processing.

Implications and Future Work

This investigation into language agency offers significant insights into the nuanced ways biases manifest in text. The findings suggest a critical need for more nuanced LLMs that account for and mitigate these biases. Future research should expand to cover more diverse data and continue improving classifier accuracy and applicability. Additionally, these insights are crucial for developers of LLMs and users of these models in applications, especially in sensitive contexts where biased language may have real-world consequences.

Concluding Remarks

By methodically analyzing the language of agency across diverse text sources and forms, this study illuminates the significant work that remains in understanding and addressing language-based social biases. The hope is that these efforts will lead to more informed and equitable applications of AI in natural language processing, reducing the perpetuation of bias and enhancing fairness in automated text generation and analysis.