Achieving Human Parity on Automatic Chinese to English News Translation

Published 15 Mar 2018 in cs.CL | (1803.05567v2)

Abstract: Machine translation has made rapid advances in recent years. Millions of people are using it today in online translation systems and mobile applications in order to communicate across language barriers. The question naturally arises whether such systems can approach or achieve parity with human translations. In this paper, we first address the problem of how to define and accurately measure human parity in translation. We then describe Microsoft's machine translation system and measure the quality of its translations on the widely used WMT 2017 news translation task from Chinese to English. We find that our latest neural machine translation system has reached a new state-of-the-art, and that the translation quality is at human parity when compared to professional human translations. We also find that it significantly exceeds the quality of crowd-sourced non-professional translations.

Abstract PDF Upgrade to Chat

Citations (597)

View on Semantic Scholar

Summary

The paper demonstrates that innovative techniques like dual learning and deliberation networks enable machine translations to reach human parity on professional news content.
Methodological advances including two-pass decoding, agreement regularization, and system combination significantly boost BLEU scores and overall translation quality.
Human evaluations confirm the system’s translations are statistically indistinguishable from professional human translations, indicating a major breakthrough in NMT.

Achieving Human Parity in Chinese-to-English News Translation

This paper presents a comprehensive study conducted by researchers at Microsoft AI Research, focused on achieving human parity in machine translation, specifically for Chinese-to-English news translation. Leveraging a state-of-the-art neural machine translation (NMT) system, the paper investigates methods to enhance translation quality, ultimately reaching a level comparable to professional human translations on the WMT 2017 dataset.

Defining Human Parity in Translation

The authors address the concept of "human parity" in translation, defining it as a scenario where translations from a machine are indistinguishable from those produced by humans. The study utilizes rigorous statistical methodologies to ensure that the translations meet this criterion by employing human evaluators to judge translation parity directly, rather than relying solely on traditional metrics like BLEU.

Methodological Innovations

Several key innovations were introduced to overcome the challenges of achieving human parity. These include:

Dual Learning and Joint Training: The translation process is treated as a dual problem, leveraging both source-to-target (S2T) and target-to-source (T2S) translations to make full use of available monolingual and bilingual corpuses. This method enhances the training process by iteratively updating both translation directions.
Deliberation Networks: Introducing two-pass decoding, this method allows the system to generate a draft translation initially, followed by a refinement phase that incorporates contextual information from both preceding and following words.
Agreement Regularization: This approach focuses on reducing exposure bias by training systems in both left-to-right and right-to-left decoding sequences and ensuring their outputs are consistent with each other.
Data Selection and Filtering: The researchers employed advanced techniques to filter out noisy data and select relevant data. Notably, a bilingual sentence vector representation was developed to map sentences across languages, which was instrumental in enhancing data quality for training.
System Combination and Re-ranking: By combining outputs from multiple models and using features such as LLM scores and cross-lingual sentence similarity, the researchers improved final translation outputs through a re-ranking process.

Experimental Results

The results show significant improvements across different systems with BLEU scores surpassing previous benchmarks. For instance, the dual learning and deliberation networks achieved a BLEU score of 27.40, demonstrating the efficacy of these combined methodologies. Further enhancements through agreement regularization and joint training underscore the capability of the advancements above baseline system performance.

Human Evaluation

Human evaluations confirmed the machine translations were statistically indistinguishable from human-produced translations, thus achieving human parity. The paper meticulously outlines the evaluation process, which utilized direct human assessments on translation quality.

Implications and Future Prospects

The implications of achieving human parity in translation are profound, with potential applications extending well beyond news translation. The techniques proposed could enhance machine translation across different language pairs and domains, provided the availability of sufficient data. Future research may explore low-resource languages and explore further scalability of these approaches. The authors highlight the need for continuing advancements in sequence-to-sequence models, ensuring machine translation systems remain adaptable and robust across diverse translation tasks.

In summary, the paper contributes substantial advancements to the field of machine translation, presenting methodologies that enable systems to achieve translation quality at par with human efforts, whilst laying valuable groundwork for future explorations in AI-driven language translation.