Sufficiency of soft verification at larger scales
Determine whether soft verification, defined as selecting synthetic coding trajectories by line-level patch recall without executing unit tests, remains sufficient to improve coding agent performance when scaling to larger base models and greater training data volumes, or whether unit test–based hard verification becomes necessary for further gains.
References
We could not test this hypothesis at our scale. It is possible that with larger models or more training data, soft verification no longer suffices and hard verification with correct code becomes essential.
— SERA: Soft-Verified Efficient Repository Agents
(2601.20789 - Shen et al., 28 Jan 2026) in Section 9 (Limitations), Hard vs soft verification