How do datasets, developers, and models affect biases in a low-resourced language?

Published 7 Jun 2025 in cs.CL, cs.CY, and cs.HC | (2506.06816v1)

Abstract: Sociotechnical systems, such as language technologies, frequently exhibit identity-based biases. These biases exacerbate the experiences of historically marginalized communities and remain understudied in low-resource contexts. While models and datasets specific to a language or with multilingual support are commonly recommended to address these biases, this paper empirically tests the effectiveness of such approaches in the context of gender, religion, and nationality-based identities in Bengali, a widely spoken but low-resourced language. We conducted an algorithmic audit of sentiment analysis models built on mBERT and BanglaBERT, which were fine-tuned using all Bengali sentiment analysis (BSA) datasets from Google Dataset Search. Our analyses showed that BSA models exhibit biases across different identity categories despite having similar semantic content and structure. We also examined the inconsistencies and uncertainties arising from combining pre-trained models and datasets created by individuals from diverse demographic backgrounds. We connected these findings to the broader discussions on epistemic injustice, AI alignment, and methodological decisions in algorithmic audits.

Abstract PDF Upgrade to Chat

Summary

The paper reveals significant biases in Bengali sentiment models, with 61% favoring male identities and skewed religious and national outcomes.
The study employs a detailed audit of 38 models using mBERT and BanglaBERT fine-tuned on 19 datasets to assess bias patterns.
The analysis shows that developer demographics do not significantly influence bias, pointing to underlying structural issues in dataset and model design.

Biases in NLP for Low-Resource Languages: Insights from Bengali Sentiment Analysis

This paper addresses the intricate issue of biases inherent in sociotechnical systems, specifically focusing on Bengali sentiment analysis models and datasets. It aims to empirically examine biases related to gender, religion, and nationality within Bengali—a widely spoken yet low-resourced language in NLP. Despite multilingual pre-trained models and language-specific datasets being proposed to combat such biases, the paper investigates their efficacy and uncovers persistent biases in sentiment analysis models fine-tuned with these resources.

The researchers conducted a thorough algorithmic audit on 38 sentiment analysis models, built upon two distinct pre-trained models, mBERT and BanglaBERT, and fine-tuned with 19 Bengali sentiment analysis datasets. The results unveil substantial biases, with the models showing preferences across various identity categories irrespective of their semantic equivalence or structural similarity.

Significantly, models displayed a notable gender bias—61% favoring male identities, while 24% leaned towards female identities. Similarly, in religious identity assessments, 24% of models were biased towards Hindu identities compared to 61% for Muslims. Concerning nationality, 50% of models favored Bangladeshi identities over Indian ones, with 26% showing the reverse trend. Such patterns highlight a profound issue within the deployment and functioning of these models in real-world applications.

The paper further explores examining whether these biases are influenced by the demographic backgrounds of dataset developers. It concludes that there is no significant statistical correlation between the developers' demographic identities and the resultant biases in the models. This finding suggests that the biases are likely rooted in infrastructural and foundational aspects of dataset creation and model fine-tuning processes rather than the intrinsic biases of developers themselves.

Furthermore, a meticulous comparison of models fine-tuned by combining different pre-trained models and training datasets points towards a lesser bias in language-specific models (like BanglaBERT) when compared to the more generalized multilingual models (like mBERT). This insight advocates for prioritizing language-specific model development for tackling biases effectively in NLP tasks for low-resource languages.

From a broader perspective, the study underscores the epistemic injustice prevalent in NLP tasks, where low-resource languages suffer from inadequate representation and therefore heightened biases. These biases manifest through testimonial and hermeneutical injustices, disrupting equitable AI alignment across diverse cultural values. The paper calls for decolonizing NLP research, highlighting the importance of aligning AI systems with indigenous social values while addressing cultural imposition and exploitation risks.

In conclusion, the paper emphasizes the need for comprehensive audits focusing on downstream NLP model biases and the implications for AI alignment. Employing meticulous dataset selection, pre-trained model choice, and fairness metrics in audits can pave the way towards socially just, transparent, and inclusive AI policies. Future efforts should extend to fostering sustainable NLP endeavors in low-resource contexts and devising robust frameworks that subvert existing biases, promoting a broader recognition and integration of diverse identities in language technologies.

Markdown Report Issue