Refined theoretical framework for SMC performance on math reasoning benchmarks

Develop a refined theoretical framework that explains and predicts the performance of Sequential Monte Carlo on mathematical reasoning benchmarks (e.g., AIME and MATH500), especially in regimes where larger divergence between the induced intermediate distributions and the true value function correlates with higher empirical accuracy, beyond what is captured by total-variation-based sampling guarantees.

Background

Empirical results show that SMC often uniformly improves over Best-of-N on math benchmarks. However, the measured divergences between the intermediate target distributions induced by the process reward model and the true distributions do not consistently correlate with performance; in some cases, larger divergence yields better accuracy.

This suggests that total-variation-style sampling error does not fully capture task utility in these settings. The authors therefore pose the need for a more nuanced theoretical framework tailored to such benchmarks.

References

We leave as an intriguing open question the development of a more refined framework that captures performance on such benchmarks.

Reject, Resample, Repeat: Understanding Parallel Reasoning in Language Model Inference  (2603.07887 - Golowich et al., 9 Mar 2026) in Section 1, Subsection "Empirical Contributions: Does the Theory Predict the Performance of SMC in LLMs?"