Eliciting and Scoring Multi-Guess Probability Distributions in Open-Ended Forecasting

Develop evaluation protocols that elicit a forecaster’s full probability distribution over multiple semantically distinct candidate answers and apply an appropriate multi-class Brier scoring rule to score those distributions in the open-ended setting.

Background

The paper adapts the multi-class Brier score to a simplified setting where the forecaster reports only a single guess with a single probability, acknowledging that this assumption is technically incorrect when multiple plausible answers carry non-zero probability.

A comprehensive approach would allow forecasters to report all guesses with non-zero probability and use a multi-class Brier scoring rule to evaluate the full distribution, but the authors leave exploring this extension for future work.

References

This is technically incorrect to assume as the forecaster may have non-zero probability for guesses other than y. Ideally the forecaster should report all its guesses which have non-zero probability (with the multi-class brier scoring rule still being applicable) but we leave exploring this direction for future work.

Scaling Open-Ended Reasoning to Predict the Future  (2512.25070 - Chandak et al., 31 Dec 2025) in Appendix, Section “Adapting Brier Score to free-form responses”