Eliciting and Scoring Multi-Guess Probability Distributions in Open-Ended Forecasting
Develop evaluation protocols that elicit a forecaster’s full probability distribution over multiple semantically distinct candidate answers and apply an appropriate multi-class Brier scoring rule to score those distributions in the open-ended setting.
References
This is technically incorrect to assume as the forecaster may have non-zero probability for guesses other than y. Ideally the forecaster should report all its guesses which have non-zero probability (with the multi-class brier scoring rule still being applicable) but we leave exploring this direction for future work.
— Scaling Open-Ended Reasoning to Predict the Future
(2512.25070 - Chandak et al., 31 Dec 2025) in Appendix, Section “Adapting Brier Score to free-form responses”