Sufficiency of soft verification at larger scales

Determine whether soft verification, defined as selecting synthetic coding trajectories by line-level patch recall without executing unit tests, remains sufficient to improve coding agent performance when scaling to larger base models and greater training data volumes, or whether unit test–based hard verification becomes necessary for further gains.

Background

The paper introduces Soft Verified Generation (SVG), which uses line-level patch recall to select training trajectories instead of unit test execution. Ablations show little difference among verification thresholds at the tested scales, suggesting verification may not be essential for early gains.

In the Limitations, the authors note they were unable to test whether this conclusion holds at larger scales and speculate that hard verification might become necessary as models and datasets grow.

References

We could not test this hypothesis at our scale. It is possible that with larger models or more training data, soft verification no longer suffices and hard verification with correct code becomes essential.

SERA: Soft-Verified Efficient Repository Agents  (2601.20789 - Shen et al., 28 Jan 2026) in Section 9 (Limitations), Hard vs soft verification