Papers
Topics
Authors
Recent
Search
2000 character limit reached

Building a Multi-domain Neural Machine Translation Model using Knowledge Distillation

Published 15 Apr 2020 in cs.CL | (2004.07324v1)

Abstract: Lack of specialized data makes building a multi-domain neural machine translation tool challenging. Although emerging literature dealing with low resource languages starts to show promising results, most state-of-the-art models used millions of sentences. Today, the majority of multi-domain adaptation techniques are based on complex and sophisticated architectures that are not adapted for real-world applications. So far, no scalable method is performing better than the simple yet effective mixed-finetuning, i.e finetuning a generic model with a mix of all specialized data and generic data. In this paper, we propose a new training pipeline where knowledge distillation and multiple specialized teachers allow us to efficiently finetune a model without adding new costs at inference time. Our experiments demonstrated that our training pipeline allows improving the performance of multi-domain translation over finetuning in configurations with 2, 3, and 4 domains by up to 2 points in BLEU.

Citations (14)

Summary

Paper to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.