Global Entity Ranking Across Multiple Languages

Published 17 Mar 2017 in cs.IR, cs.CL, and cs.SI | (1703.06108v1)

Abstract: We present work on building a global long-tailed ranking of entities across multiple languages using Wikipedia and Freebase knowledge bases. We identify multiple features and build a model to rank entities using a ground-truth dataset of more than 10 thousand labels. The final system ranks 27 million entities with 75% precision and 48% F1 score. We provide performance evaluation and empirical evidence of the quality of ranking across languages, and open the final ranked lists for future research.