Attribution of provider-specific EvoScore preferences to training strategies
Determine whether the observed cross-provider differences in SWE-CI EvoScore ranking sensitivity to the gamma parameter—where some providers’ language models prefer short-term gains (gamma < 1) while others prefer long-term gains (gamma > 1)—are causally attributable to differences in the providers’ model training strategies, and ascertain whether the within-provider stability of these preferences indeed indicates stable internal training pipelines.
References
We conjecture that this reflects differences in training strategies adopted by different providers, while the relative consistency within each provider suggests that their internal training pipelines remain largely stable.
— SWE-CI: Evaluating Agent Capabilities in Maintaining Codebases via Continuous Integration
(2603.03823 - Chen et al., 4 Mar 2026) in Observation 2, Section 4 (Results)