Persistence of provider-level characteristics across model generations
Ascertain whether provider-level characteristics are consistently expressed across model generations under the Prediction Arena evaluation by extending the assessment beyond the initial 3-day Cohort 2 snapshot to a full 30-day period.
References
Whether provider-level characteristics are consistently expressed across generations cannot be concluded from this window alone; a full 30-day evaluation of Cohort 2 is planned.
— Prediction Arena: Benchmarking AI Models on Real-World Prediction Markets
(2604.07355 - Zhang et al., 28 Mar 2026) in Section 11, Conclusion