From One to Many: Expanding the Scope of Toxicity Mitigation in Language Models

Published 6 Mar 2024 in cs.CL and cs.AI | (2403.03893v3)

Abstract: To date, toxicity mitigation in LLMs has almost entirely been focused on single-language settings. As LLMs embrace multilingual capabilities, it's crucial our safety measures keep pace. Recognizing this research gap, our approach expands the scope of conventional toxicity mitigation to address the complexities presented by multiple languages. In the absence of sufficient annotated datasets across languages, we employ translated data to evaluate and enhance our mitigation techniques. We also compare finetuning mitigation approaches against retrieval-augmented techniques under both static and continual toxicity mitigation scenarios. This allows us to examine the effects of translation quality and the cross-lingual transfer on toxicity mitigation. We also explore how model size and data quantity affect the success of these mitigation efforts. Covering nine languages, our study represents a broad array of linguistic families and levels of resource availability, ranging from high to mid-resource languages. Through comprehensive experiments, we provide insights into the complexities of multilingual toxicity mitigation, offering valuable insights and paving the way for future research in this increasingly important field. Code and data are available at https://github.com/for-ai/goodtriever.

Abstract PDF HTML Upgrade to Chat

References (63)

Citations (6)

View on Semantic Scholar

Summary

The paper demonstrates that translated data often outperforms in-language datasets in mitigating toxicity across nine languages.
It compares finetuning-based DExperts with retrieval-augmented Goodtriever, showing the latter's consistent benefits, especially in mid-resource languages.
The study underscores the need for scalable, culturally sensitive evaluation frameworks to enhance global safety in language models.

Exploring Multilingual Toxicity Mitigation in LLMs

Introduction

The rapid adoption of LLMs across various applications has illuminated the profound impact of their multilingual capabilities. However, this linguistically diverse application of LLMs amplifies the need for robust toxicity mitigation techniques that transcend the English language to ensure global usability and safety. The research presented explores the complexities of implementing such multilingual toxicity mitigation techniques. It evaluates the effectiveness of translated data versus in-language data, compares retrieval-augmented techniques against finetuning approaches, and investigates the scalability and versatility of these mitigation strategies across multiple languages.

Mitigation Techniques

The two primary toxicity mitigation techniques examined are DExperts, a finetuning-based method, and Goodtriever, a retrieval-augmented approach. Both techniques utilize a baseline mGPT model, varying in size from 1.3B to 13B parameters, and are tested against a spectrum of nine languages. This broad linguistic range includes high-resource languages (e.g., English, Russian, Italian, French, Portuguese, and Spanish) and mid-resource languages (e.g., Arabic, Hindi, Korean), spread across five distinct scripts.

Datasets and Evaluation

The study extends established datasets by incorporating translated variants, aiming to address the gap left by the scarcity of in-language toxicity annotation for many languages. For evaluation, the research employs a set of standardized prompts derived from the HolisticBias dataset, translated into the languages of interest. This aids in the consistent assessment of toxicity across languages despite inherent challenges such as cultural nuances and translation inaccuracies.

Findings

A key finding is the surprising efficacy of translated data in reducing toxicity, often surpassing the results of in-language datasets. This phenomenon is observed across both high and mid-resource languages, suggesting that despite potential losses in translation, the core toxicological cues are preserved and effectively mitigated. Further, the retrieval-based Goodtriever method consistently outperforms the finetuning-based DExperts, especially in scenarios involving mid-resource languages or more complex multilingual settings.

Future Directions

The study sheds light on the importance of continually evolving toxicity mitigation techniques to accommodate the dynamic nature of language and the diversifying spectrum of users engaging with LLMs. It underscores the need for further research into developing more nuanced and culturally sensitive evaluation frameworks that better reflect the multilingual and multicultural reality of global LLM deployment.

Implications

This research marks a pivotal step towards understanding and implementing multilingual toxicity mitigation in LLMs. It paves the way for future explorations into scalable, effective methods that ensure safer, more inclusive language technologies. By demonstrating the potential of both translated data and retrieval-augmented techniques, the study offers valuable insights for developers and researchers aiming to enhance the global usability and safety of LLMs.

Markdown Report Issue