Enhancing Gender-Inclusive Machine Translation with Neomorphemes and Large Language Models

Published 14 May 2024 in cs.CL | (2405.08477v1)

Abstract: Machine translation (MT) models are known to suffer from gender bias, especially when translating into languages with extensive gendered morphology. Accordingly, they still fall short in using gender-inclusive language, also representative of non-binary identities. In this paper, we look at gender-inclusive neomorphemes, neologistic elements that avoid binary gender markings as an approach towards fairer MT. In this direction, we explore prompting techniques with LLMs to translate from English into Italian using neomorphemes. So far, this area has been under-explored due to its novelty and the lack of publicly available evaluation resources. We fill this gap by releasing Neo-GATE, a resource designed to evaluate gender-inclusive en-it translation with neomorphemes. With Neo-GATE, we assess four LLMs of different families and sizes and different prompt formats, identifying strengths and weaknesses of each on this novel task for MT.

Abstract PDF Upgrade to Chat

Summary

The paper introduces Neo-GATE, a novel benchmark that uses neomorphemes to mitigate gender bias in machine translation.
It evaluates major LLMs including GPT-4 and Mixtral with zero-shot and few-shot strategies to improve gender-inclusive outputs.
Results highlight improved coverage and accuracy with GPT-4, paving the way for future advances in equitable translation systems.

Exploring Gender-Inclusive Machine Translation with Neo-GATE

Background and Motivation

Machine Translation (MT) systems have long been affected by gender bias, often defaulting to masculine forms or perpetuating gender stereotypes. This is especially challenging when translating from languages like English, which have limited gender markings, into languages with extensive gendered morphology, like Italian. This bias not only underrepresents women but also overlooks non-binary individuals entirely.

Given the rising need for more inclusive language technologies, there's growing interest in gender-inclusive solutions, particularly from grassroots efforts within the LGBTQ+ community. This paper explores the use of neomorphemes—innovative linguistic elements that break away from binary gender norms—as a potential solution for fairer machine translation.

The Neo-GATE Benchmark

A significant contribution of this research is the introduction of Neo-GATE, a dedicated benchmark for evaluating gender-inclusive English-to-Italian translation that utilizes neomorphemes. Neo-GATE is built on top of the GATE benchmark, which addresses gender bias in MT but does not account for non-binary or gender-neutral expressions.

Neo-GATE offers flexibility by treating neomorphemes as an open class, allowing for adaptability to various neologistic paradigms. This adaptability is crucial given the evolving nature of these linguistic elements. The benchmark includes tagged references that replace gendered morphemes and function words with placeholders, making it straightforward to test different neomorpheme paradigms.

Experimental Setup

The experiments involved four prominent LLMs to handle gender-inclusive translations:

Mixtral
Tower
LLama 2
GPT-4

These models were tested against two popular Italian neomorpheme paradigms—Asterisk (*) and Schwa (ə/ɛ). Various prompting strategies were used, including zero-shot and few-shot learning prompts, to gauge the models' ability to produce accurate and inclusive translations.

Key Findings

Zero-Shot Results

GPT-4 and Mixtral performed relatively better in generating neomorphemes accurately, though GPT-4 was notably ahead.
LLama 2 and Tower struggled significantly, with LLama 2 rarely generating neomorphemes and Tower opting for fluent, gendered outputs.

Few-Shot Learning

Few-shot prompts generally improved the models' performance. Here are some highlights:

Coverage and Accuracy: The coverage of gender-specific words increased with more demonstrations. GPT and Mixtral showed substantial improvements.
Coverage-Weighted Accuracy: GPT-4 scored the highest, achieving a notable improvement over its zero-shot performance.
Mis-Generation: While GPT-4 and Tower showed fewer mis-generations, Mixtral exhibited higher rates of incorrect neomorpheme use, particularly with fewer demonstrations.

Implications and Future Directions

This research underscores the potential of LLMs in adapting to tasks requiring gender-inclusive language, even though challenges remain. The positive results from GPT-4 and Mixtral suggest that with the right calibration, there's room for improvement.

Practically, this work pushes the boundary in gender-inclusive MT and lays the groundwork for more robust and nuanced translation systems that can cater to evolving linguistic needs. The Neo-GATE benchmark itself is a step forward, providing a resource for future research to build on.

Theoretically, exploring how LLMs handle neomorphemes provides insights into the adaptability and limitations of these models in processing innovative linguistic constructs. This could spur further advancements in training LLMs with diverse and inclusive datasets.

Conclusion

The move towards gender-inclusive MT is not just a technological upgrade but a societal necessity, as it ensures fair representation of all gender identities. This paper highlights the first significant efforts in this direction, providing both a conceptual framework and empirical evidence of what's achievable with current LLMs. Future work can build on these findings, refining models and expanding resources like Neo-GATE to cover more languages and neologistic paradigms.

Markdown Report Issue