Papers
Topics
Authors
Recent
Search
2000 character limit reached

Mahānāma: A Unique Testbed for Literary Entity Discovery and Linking

Published 24 Sep 2025 in cs.CL | (2509.19844v1)

Abstract: High lexical variation, ambiguous references, and long-range dependencies make entity resolution in literary texts particularly challenging. We present Mah={a}n={a}ma, the first large-scale dataset for end-to-end Entity Discovery and Linking (EDL) in Sanskrit, a morphologically rich and under-resourced language. Derived from the Mah={a}bh={a}rata, the world's longest epic, the dataset comprises over 109K named entity mentions mapped to 5.5K unique entities, and is aligned with an English knowledge base to support cross-lingual linking. The complex narrative structure of Mah={a}n={a}ma, coupled with extensive name variation and ambiguity, poses significant challenges to resolution systems. Our evaluation reveals that current coreference and entity linking models struggle when evaluated on the global context of the test set. These results highlight the limitations of current approaches in resolving entities within such complex discourse. Mah=an=ama thus provides a unique benchmark for advancing entity resolution, especially in literary domains.

Summary

No one has generated a summary of this paper yet.

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 4 tweets with 5 likes about this paper.