Selecting the best transfer language for a given task language

Determine, for a specified natural language processing task and a given low-resource task language, which high-resource transfer language maximizes performance when used for cross-lingual transfer to the task language.

Background

Cross-lingual transfer leverages high-resource languages to improve performance on low-resource languages across tasks such as machine translation, part-of-speech tagging, dependency parsing, and entity linking. Despite its wide use, the choice of which transfer language to use has typically been made heuristically, often based on the experimenter's intuition.

Multiple factors may influence transfer effectiveness, including genetic/typological similarity, lexical overlap, geographic proximity, and the amount of available training data. The paper motivates a principled approach to this selection problem and proposes a learning-to-rank method (LANGRANK) as a step toward addressing it.

References

However, determining the best transfer language for any particular task language remains an open question - the choice of transfer language has traditionally been done in a heuristic manner, often based on the intuition of the experimenter.

Choosing Transfer Languages for Cross-Lingual Learning  (1905.12688 - Lin et al., 2019) in Section 1 (Introduction)