Reliable Evaluation of Large Language Models
Develop reliable evaluation methodologies for large language models that effectively assess model performance, including helpfulness and harmlessness, addressing the acknowledged unresolved problem of evaluating such models.
References
However, evaluating LLMs has consistently been a challenging and unresolved problem.
— Safe RLHF: Safe Reinforcement Learning from Human Feedback
(2310.12773 - Dai et al., 2023) in Section 4.1, Helpfulness and Harmlessness Evaluation