Cross-neutralising: Probing for joint encoding of linguistic information in multilingual models
Abstract: Multilingual sentence encoders are widely used to transfer NLP models across languages. The success of this transfer is, however, dependent on the model's ability to encode the patterns of cross-lingual similarity and variation. Yet, little is known as to how these models are able to do this. We propose a simple method to study how relationships between languages are encoded in two state-of-the-art multilingual models (i.e. M-BERT and XLM-R). The results provide insight into their information sharing mechanisms and suggest that linguistic properties are encoded jointly across typologically-similar languages in these models.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.