Encoder-Decoder Models Can Benefit from Pre-trained Masked Language Models in Grammatical Error Correction

Published 3 May 2020 in cs.CL | (2005.00987v2)

Abstract: This paper investigates how to effectively incorporate a pre-trained masked LLM (MLM), such as BERT, into an encoder-decoder (EncDec) model for grammatical error correction (GEC). The answer to this question is not as straightforward as one might expect because the previous common methods for incorporating a MLM into an EncDec model have potential drawbacks when applied to GEC. For example, the distribution of the inputs to a GEC model can be considerably different (erroneous, clumsy, etc.) from that of the corpora used for pre-training MLMs; however, this issue is not addressed in the previous methods. Our experiments show that our proposed method, where we first fine-tune a MLM with a given GEC corpus and then use the output of the fine-tuned MLM as additional features in the GEC model, maximizes the benefit of the MLM. The best-performing model achieves state-of-the-art performances on the BEA-2019 and CoNLL-2014 benchmarks. Our code is publicly available at: https://github.com/kanekomasahiro/bert-gec.

Abstract PDF Upgrade to Chat

Authors (5)

Citations (133)

View on Semantic Scholar

Summary

The paper introduces a novel method that fine-tunes pre-trained BERT on a GEC corpus to enrich Encoder-Decoder models with additional contextual features.
The approach mitigates issues like catastrophic forgetting and domain mismatch, outperforming both Init and Fuse strategies on BEA-2019 and CoNLL-2014 benchmarks.
Results show robust generalizability, indicating that integrating fine-tuned MLM outputs can significantly advance automated grammatical error correction tools.

Overview of "Encoder-Decoder Models Can Benefit from Pre-trained Masked LLMs in Grammatical Error Correction"

The paper explores the integration of pre-trained Masked LLMs (MLMs), such as BERT, into Encoder-Decoder (EncDec) architectures for the task of Grammatical Error Correction (GEC). The motivation is derived from the transformational impact MLMs have had across various NLP tasks, yet the application to GEC poses unique challenges due to the disparity between the distributions of data for pre-training MLMs and GEC tasks.

Key Contributions

The authors propose a novel method that involves first fine-tuning a pre-trained MLM with a GEC-specific corpus, followed by utilizing the output of the fine-tuned MLM as additional features in the GEC model. This method is designed to address the limitations of two common strategies:

Initialization (Init): This method initializes the downstream task model using the parameters from a pre-trained MLM. However, it is prone to catastrophic forgetting, impairing the model’s ability to retain pre-trained knowledge.
Fusion (Fuse): Here, pre-trained representations are incorporated as supplemental features. While this preserves the pre-trained knowledge, it limits adaptability to the GEC-specific distribution of erroneous inputs.

Experimental Validation

Experiments demonstrate the efficacy of the proposed method. Specifically, the combined approach of fine-tuning BERT on GEC corpus and using its output in conjunction with EncDec models excels in performance compared to standard Init and Fuse strategies. The model achieves superior results on established benchmarks like BEA-2019 and CoNLL-2014, reaching state-of-the-art performances.

Methodological Integration

The researchers introduce two specific configurations for fine-tuning:

BERT-fuse mask: Fine-tunes BERT with GEC data, adjusting it for the task’s input distribution.
BERT-fuse GED: This variant further enhances the process through the intermediate task of Grammatical Error Detection (GED), which provides a more nuanced adjustment to identifying and rectifying grammatical errors.

Implications and Future Directions

Results from the research underscore the potential of leveraging fine-tuned MLM outputs as enriched contextual features within GEC models. This advancement indicates practical enhancements in automated language tools that cater to diverse linguistic errors in learner corpora. The superior performance demonstrated on multiple corpora illustrates robust generalizability, which is vital for GEC applications.

Future work could explore ways to refine transfer learning approaches further, optimizing for even more diverse linguistic tasks beyond GEC. There is also an avenue to explore additional pre-trained models like RoBERTa and ALBERT to gauge their efficacy in similar integration methodologies.

In conclusion, the study provides a detailed examination of the nuances involved in deploying MLMs within an EncDec framework for GEC, offering valuable insights and methodologies that can inspire further research and development in the intersection of pre-trained LLMs and linguistic error correction tasks.

Markdown Report Issue