Family-level pattern behind GLM-5’s early underperformance
Determine whether the initial underperformance of #1{glm-5} relative to its predecessor #1{glm-4.7} in the 3-day Kalshi paper-trading snapshot reflects a persistent model-family pattern within the GLM series or instead results from short-horizon variance, by evaluating #1{glm-5} over a sufficiently long horizon.
References
Whether this represents a family-level pattern is unclear: its predecessor #1{glm-4.7} was tied for the best Phase 1 return among non-Grok models ($-7.2\%$) and ultimately finished first in overall Kalshi standings ($-16.0\%$).
— Prediction Arena: Benchmarking AI Models on Real-World Prediction Markets
(2604.07355 - Zhang et al., 28 Mar 2026) in Section 8, Cross-Generation Preliminary Comparison