Interaction between problems-per-batch and rollout saturation

Characterize the precise relationship between the number of unique problems per batch (B_p) and the saturation point of the compute-optimal number of parallel rollouts per problem (n) when allocating sampling compute for rollout-based on-policy reinforcement learning of large language models.

Background

The paper studies compute-optimal allocation of sampling compute across the number of unique problems per batch (B_p), the number of parallel rollouts per problem (n), and the number of update steps (M) in on-policy RL post-training of LLMs.

In additional analyses across different fixed values of B_p, the authors observe that larger B_p settings appear to saturate at smaller n, potentially due to batch size constraints, but they do not establish the exact relationship. They explicitly state that the precise interaction between B_p and the saturation point of n remains an open question.

References

The precise interaction between $B_\text{p}$ and the saturation point of $n$ remains an open question for future investigation.

IsoCompute Playbook: Optimally Scaling Sampling Compute for LLM RL  (2603.12151 - Cheng et al., 12 Mar 2026) in Appendix: Additional Compute-Optimal Results