Mitigating the Impact of Reference Quality on Evaluation of Summarization Systems with Reference-Free Metrics

Published 8 Oct 2024 in cs.CL, cs.AI, cs.IR, and cs.MM | (2410.10867v1)

Abstract: Automatic metrics are used as proxies to evaluate abstractive summarization systems when human annotations are too expensive. To be useful, these metrics should be fine-grained, show a high correlation with human annotations, and ideally be independent of reference quality; however, most standard evaluation metrics for summarization are reference-based, and existing reference-free metrics correlate poorly with relevance, especially on summaries of longer documents. In this paper, we introduce a reference-free metric that correlates well with human evaluated relevance, while being very cheap to compute. We show that this metric can also be used alongside reference-based metrics to improve their robustness in low quality reference settings.