Effect of GRPO Weighting Inversion on Distribution Sharpening
Determine whether the inversion of the population-level weighting function in Group Relative Policy Optimization (GRPO), which increases as the pass rate approaches one, contributes to distribution sharpening when training datasets contain a substantial fraction of very easy inputs.
References
We conjecture that this inversion may contribute to distribution sharpening~\citep{yue2025doesreinforcementlearningreally,wu2026invisibleleashrlvrescape} when datasets contain a substantial fraction of overly easy inputs, and leave a detailed analysis to future work.
— Maximum Likelihood Reinforcement Learning
(2602.02710 - Tajwar et al., 2 Feb 2026) in Section: A Unifying Weight-Function View (footnote in the paragraph discussing GRPO weighting)