Papers
Topics
Authors
Recent
Search
2000 character limit reached

Statistically significant detection of semantic shifts using contextual word embeddings

Published 8 Apr 2021 in cs.CL | (2104.03776v2)

Abstract: Detecting lexical semantic change in smaller data sets, e.g. in historical linguistics and digital humanities, is challenging due to a lack of statistical power. This issue is exacerbated by non-contextual embedding models that produce one embedding per word and, therefore, mask the variability present in the data. In this article, we propose an approach to estimate semantic shift by combining contextual word embeddings with permutation-based statistical tests. We use the false discovery rate procedure to address the large number of hypothesis tests being conducted simultaneously. We demonstrate the performance of this approach in simulation where it achieves consistently high precision by suppressing false positives. We additionally analyze real-world data from SemEval-2020 Task 1 and the Liverpool FC subreddit corpus. We show that by taking sample variation into account, we can improve the robustness of individual semantic shift estimates without degrading overall performance.

Citations (16)

Summary

No one has generated a summary of this paper yet.

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.