Do RLVR benefits extend from math/coding to MCQA?
Determine whether the benefits of reinforcement learning from verifiable rewards (RLVR) observed on mathematics and coding tasks extend to multiple-choice question answering (MCQA), which features a much smaller and constrained answer space compared to open-ended math and coding outputs.
References
It is unclear whether the benefits observed in math and coding will translate to MCQA.
— Med-RLVR: Emerging Medical Reasoning from a 3B base model via reinforcement Learning
(2502.19655 - Zhang et al., 27 Feb 2025) in Section 1 (Introduction)