Open-Domain Dialog Evaluation using Follow-Ups Likelihood

Published 12 Sep 2022 in cs.CL | (2209.05185v1)

Abstract: Automatic evaluation of open-domain dialogs remains an unsolved problem. Moreover, existing methods do not correlate strongly with human annotations. This paper presents a new automated evaluation method using follow-ups: we measure the probability that a LLM will continue the conversation with a fixed set of follow-ups (e.g., not really relevant here, what are you trying to say). When compared against twelve existing methods, our new evaluation achieves the highest correlation with human evaluations.