Performance of Unsupervised Elicitation and Easy-to-Hard Generalization on Superhuman Tasks
Determine whether unsupervised elicitation techniques and easy-to-hard generalization methods for steering large language models maintain their observed performance on standard datasets when applied to real tasks that are beyond human capabilities.
References
Though each of these approaches has been found to perform well on a variety of datasets, it is unclear whether they will perform as well when applied to real tasks which are beyond human capabilities.
— Three Concrete Challenges and Two Hopes for the Safety of Unsupervised Elicitation
(2602.20400 - Canavan et al., 23 Feb 2026) in Section 1 (Introduction)