Assessing whether a single attribute reliably predicts transfer success

Ascertain whether any single language attribute—such as genetic or typological similarity, lexical overlap, or available data size—constitutes the most reliable criterion for selecting a high-resource transfer language for a given low-resource task language and specific natural language processing task.

Background

Researchers commonly use individual heuristics—such as phylogenetic proximity or typological similarity—to select transfer languages. However, languages within the same family can differ in properties relevant to specific tasks, and other factors like dataset size or lexical overlap may also play important roles.

The paper highlights uncertainty about whether any single attribute is sufficient as a general criterion across tasks, motivating a multi-feature ranking approach to better predict effective transfer languages.

References

With several heuristics available for selecting a transfer language, it is unclear a priori if any single attribute of a language will be the most reliable criterion in determining whether cross-lingual learning is likely to work for a specific NLP task.

Choosing Transfer Languages for Cross-Lingual Learning  (1905.12688 - Lin et al., 2019) in Section 1 (Introduction)