- The paper reveals significant biases in Bengali sentiment models, with 61% favoring male identities and skewed religious and national outcomes.
- The study employs a detailed audit of 38 models using mBERT and BanglaBERT fine-tuned on 19 datasets to assess bias patterns.
- The analysis shows that developer demographics do not significantly influence bias, pointing to underlying structural issues in dataset and model design.
Biases in NLP for Low-Resource Languages: Insights from Bengali Sentiment Analysis
This paper addresses the intricate issue of biases inherent in sociotechnical systems, specifically focusing on Bengali sentiment analysis models and datasets. It aims to empirically examine biases related to gender, religion, and nationality within Bengali—a widely spoken yet low-resourced language in NLP. Despite multilingual pre-trained models and language-specific datasets being proposed to combat such biases, the paper investigates their efficacy and uncovers persistent biases in sentiment analysis models fine-tuned with these resources.
The researchers conducted a thorough algorithmic audit on 38 sentiment analysis models, built upon two distinct pre-trained models, mBERT and BanglaBERT, and fine-tuned with 19 Bengali sentiment analysis datasets. The results unveil substantial biases, with the models showing preferences across various identity categories irrespective of their semantic equivalence or structural similarity.
Significantly, models displayed a notable gender bias—61% favoring male identities, while 24% leaned towards female identities. Similarly, in religious identity assessments, 24% of models were biased towards Hindu identities compared to 61% for Muslims. Concerning nationality, 50% of models favored Bangladeshi identities over Indian ones, with 26% showing the reverse trend. Such patterns highlight a profound issue within the deployment and functioning of these models in real-world applications.
The paper further explores examining whether these biases are influenced by the demographic backgrounds of dataset developers. It concludes that there is no significant statistical correlation between the developers' demographic identities and the resultant biases in the models. This finding suggests that the biases are likely rooted in infrastructural and foundational aspects of dataset creation and model fine-tuning processes rather than the intrinsic biases of developers themselves.
Furthermore, a meticulous comparison of models fine-tuned by combining different pre-trained models and training datasets points towards a lesser bias in language-specific models (like BanglaBERT) when compared to the more generalized multilingual models (like mBERT). This insight advocates for prioritizing language-specific model development for tackling biases effectively in NLP tasks for low-resource languages.
From a broader perspective, the study underscores the epistemic injustice prevalent in NLP tasks, where low-resource languages suffer from inadequate representation and therefore heightened biases. These biases manifest through testimonial and hermeneutical injustices, disrupting equitable AI alignment across diverse cultural values. The paper calls for decolonizing NLP research, highlighting the importance of aligning AI systems with indigenous social values while addressing cultural imposition and exploitation risks.
In conclusion, the paper emphasizes the need for comprehensive audits focusing on downstream NLP model biases and the implications for AI alignment. Employing meticulous dataset selection, pre-trained model choice, and fairness metrics in audits can pave the way towards socially just, transparent, and inclusive AI policies. Future efforts should extend to fostering sustainable NLP endeavors in low-resource contexts and devising robust frameworks that subvert existing biases, promoting a broader recognition and integration of diverse identities in language technologies.