Modeling Coverage for Neural Machine Translation

Published 19 Jan 2016 in cs.CL | (1601.04811v6)

Abstract: Attention mechanism has enhanced state-of-the-art Neural Machine Translation (NMT) by jointly learning to align and translate. It tends to ignore past alignment information, however, which often leads to over-translation and under-translation. To address this problem, we propose coverage-based NMT in this paper. We maintain a coverage vector to keep track of the attention history. The coverage vector is fed to the attention model to help adjust future attention, which lets NMT system to consider more about untranslated source words. Experiments show that the proposed approach significantly improves both translation quality and alignment quality over standard attention-based NMT.

Abstract PDF Upgrade to Chat

Citations (731)

View on Semantic Scholar

Summary

The paper introduces a coverage mechanism that mitigates over- and under-translation, boosting BLEU scores from 28.32 to 30.14 in Chinese-English tasks.
It integrates a coverage vector to track attention history, ensuring untranslated source words are effectively addressed during decoding.
Experimental results show improved alignment and translation quality over standard NMT and phrase-based systems, confirming the model's practical benefits.

Modeling Coverage for Neural Machine Translation

"Modeling Coverage for Neural Machine Translation" addresses critical issues inherent in Neural Machine Translation (NMT) systems—specifically, over-translation and under-translation—by integrating a coverage mechanism. Authored by Zhaopeng Tu, Zhengdong Lu, Yang Liu, Xiaohua Liu, and Hang Li from Huawei Technologies and Tsinghua University, the paper proposes an innovative approach to improve both translation and alignment quality in NMT models.

Introduction and Problem Definition

The paper begins by outlining the current landscape of NMT, emphasizing the advantages over traditional Statistical Machine Translation (SMT). NMT’s usage of a single, large neural network facilitates the learning of word representations directly from data, enabling better handling of long-distance dependencies through mechanisms like Long Short-Term Memory (LSTM). Despite these advances, conventional attention-based NMT systems exhibit significant deficiencies: the lack of a coverage mechanism leads to over-translation (translating a word multiple times) and under-translation (failing to translate a word).

Proposed Approach: Coverage Mechanism for NMT

The authors introduce a coverage-based NMT mechanism (NMT-Coverage) to tackle these issues. This approach maintains a coverage vector that tracks the attention history throughout the translation process. The coverage vector aids in adjusting future attention to ensure that untranslated source words are adequately considered. The key idea is to append a coverage vector to the intermediate representations within the NMT model, which are updated sequentially during the decoding process.

Experiments and Results

Translation Quality

Extensive experiments conducted on the Chinese-English translation task show that NMT-Coverage significantly outperforms the standard attention-based NMT and a state-of-the-art phrase-based SMT system (Moses). Notable numerical results include:

Baseline NMT (GroundHog): Achieved an average BLEU score of 28.32.
Linguistic Coverage with Fertility: Achieved an average BLEU score of 29.86.
NN-based Coverage with Gating ( $d=10$ ): Achieved the highest average BLEU score of 30.14.

The improved BLEU scores across multiple configurations highlight the effectiveness of incorporating coverage vectors, whether through simpler linguistic models or more complex neural network-based models.

Alignment Quality

Further evaluation of alignment quality using Alignment Error Rate (AER) and Soft Alignment Error Rate (SAER) metrics indicates that coverage models contribute to more accurate and coherent alignments. For example:

Baseline NMT (GroundHog): Recorded an AER of 54.67.
NN-based Coverage with Gating ( $d=10$ ): Improved AER to 50.50.

The alignment improvements were achieved by utilizing coverage vectors to ensure that translated source words are less likely to be involved in generating subsequent target words, thus mitigating over-translation.

Theoretical and Practical Implications

The introduction of coverage mechanisms in NMT has several important implications:

Theoretical Contributions: The paper extends the concept of coverage from SMT to NMT, providing a novel way to model and integrate the translation history. This framework paves the way for more accurate attention modeling.
Practical Contributions: NMT systems utilizing coverage vectors demonstrate superior translation quality and alignment accuracy, directly addressing common shortcomings in current NMT approaches. These improvements are particularly significant in scenarios involving complex and lengthy translations.

Future Developments

While the current NMT-Coverage models significantly enhance performance, there are potential directions for future research:

Model Optimization: Exploring more sophisticated ways to model and update the coverage vectors could lead to further improvements.
Broader Applications: Extending the coverage mechanism to other NMT tasks, including low-resource languages, multi-modal translations, and domain-specific applications.
Optimization Techniques: Investigating how different optimization techniques, such as reinforcement learning or adversarial training, could further refine coverage-based attention models.

Conclusion

"Modeling Coverage for Neural Machine Translation" proposes a robust solution to fundamental issues in NMT systems by leveraging coverage mechanisms. The experimental results confirm that integrating these mechanisms significantly enhances the quality of translations and alignments. This study lays a solid foundation for future advancements in NMT, promising improved accuracy and efficiency in machine translation tasks across diverse languages and domains.

Markdown Report Issue