Bias Attribution in Filipino Language Models: Extending a Bias Interpretability Metric for Application on Agglutinative Languages

Published 8 Jun 2025 in cs.CL | (2506.07249v1)

Abstract: Emerging research on bias attribution and interpretability have revealed how tokens contribute to biased behavior in LLMs processing English texts. We build on this line of inquiry by adapting the information-theoretic bias attribution score metric for implementation on models handling agglutinative languages, particularly Filipino. We then demonstrate the effectiveness of our adapted method by using it on a purely Filipino model and on three multilingual models: one trained on languages worldwide and two on Southeast Asian data. Our results show that Filipino models are driven towards bias by words pertaining to people, objects, and relationships, entity-based themes that stand in contrast to the action-heavy nature of bias-contributing themes in English (i.e., criminal, sexual, and prosocial behaviors). These findings point to differences in how English and non-English models process inputs linked to sociodemographic groups and bias.

Abstract PDF Upgrade to Chat

Authors (3)

Summary

The paper introduces an extended bias attribution metric to quantify token-specific bias in Filipino language models.
It employs Filipino CrowS-Pairs, adapting an information-theoretic framework to highlight the effects of agglutinative morphology.
Key findings reveal that bias-inducing tokens are predominantly entity-related, suggesting practical avenues for bias mitigation.

Bias Attribution in Filipino LLMs: Extending a Bias Interpretability Metric

The examination of bias in LLMs, especially those handling non-English texts, is a crucial area of research as artificial intelligence technologies continue to proliferate globally. In the study titled "Bias Attribution in Filipino LLMs: Extending a Bias Interpretability Metric for Application on Agglutinative Languages," the authors Gamboa, Feng, and Lee propose an extended interpretability metric aimed at understanding token-specific contributions to biases in models processing the Filipino language. This work aligns with the broader aim of demystifying how LLMs exhibit unfair behaviors, which is imperative for ensuring ethical AI applications.

Methodological Innovation

This research adapts an existing bias attribution metric initially developed for English texts to accommodate agglutinative languages, which present distinct morphological characteristics. The original bias attribution score relies on an information-theoretic framework to highlight how individual tokens influence model biases. By utilizing Filipino CrowS-Pairs—a dataset specifically designed to evaluate bias in Filipino models—the authors extend this metric's applicability to agglutinative languages. The unique consideration of morphological differences like those in Filipino, which tend to coalesce multiple concepts into single lexical items due to its agglutinative nature, allows for a more cultured analysis of bias at the semantic level distinct from English.

Key Findings

The application of the adapted bias attribution metric to several large pre-trained models trained on Filipino and Southeast Asian languages reveals intriguing insights. Through rigorous token-level analysis, the study discovered that bias-inducing tokens in Filipino were predominantly nouns linked to entities, such as "people" and "objects," as opposed to action-oriented themes typically seen in bias-inducing English contexts. This contrasting pattern sheds light on how societal and cultural structures are echoed differently across languages within AI systems.

Implications and Future Directions

Practically, this research provides a toolkit to advance the understanding of biases ingrained in AI systems handling agglutinative languages. The identification of entity-based semantic fields as bias-inducing in Filipino models suggests areas where caution and mitigation might be necessary. Theoretically, this paper expands the frontier of bias interpretability by proving that adaptations of existing metrics can tackle the intricacies of linguistic diversity in NLP.

For future developments, it appears promising to extend this approach to other agglutinative languages, potentially resulting in improved cross-lingual bias detection and mitigation strategies. Such endeavors may propel more equitable and aware AI models capable of operating fairly in multilingual environments. Moreover, the findings invite a reevaluation of tokenization methods and LLM pretraining involving the interplay of morphology and semantics.

Conclusion

The paper lays a groundwork not only for exploring token-specific biases in Filipino LLMs but also encourages systematic examination across diverse linguistic typologies. By revealing the disparities in how LLMs process bias-inducing inputs across different cultural contexts, this research contributes both methodologically and theoretically to the growing body of work aiming at making AI technologies more inclusive and discriminatory-free. As AI continues its global integration, the importance of such studies becomes increasingly evident, placing robust interpretability at the forefront of NLP research.

Markdown Report Issue