Effectiveness of Unit Tests in SWE-Gym Raw
Determine whether the unit tests accompanying instances in the SWE-Gym Raw dataset effectively evaluate the correctness of proposed solutions, i.e., whether these tests provide reliable and sufficient validation signals when executable environments are not available.
References
And it's unclear if the unit tests are effective in evaluating the correctness of a solution.
— Training Software Engineering Agents and Verifiers with SWE-Gym
(2412.21139 - Pan et al., 2024) in Section: Dataset Construction — Extract Training Instances from Repositories