Interaction between problems-per-batch and rollout saturation
Characterize the precise relationship between the number of unique problems per batch (B_p) and the saturation point of the compute-optimal number of parallel rollouts per problem (n) when allocating sampling compute for rollout-based on-policy reinforcement learning of large language models.
References
The precise interaction between $B_\text{p}$ and the saturation point of $n$ remains an open question for future investigation.
— IsoCompute Playbook: Optimally Scaling Sampling Compute for LLM RL
(2603.12151 - Cheng et al., 12 Mar 2026) in Appendix: Additional Compute-Optimal Results