Improve DC-CoT performance on Minerva Math

Improve the accuracy of Divide-and-Conquer CoT (DC-CoT) and its high length penalty variant (DC-CoT-HLP) on the Minerva Math benchmark, where current evaluations show both methods underperform relative to DeepScaleR-1.5B-Preview baselines. Specifically, determine training or inference strategies that enable DC-CoT to handle problems that appear to require purely sequential calculations and are less amenable to parallelization, while maintaining reduced longest path length.

Background

The paper introduces Divide-and-Conquer CoT (DC-CoT), an RL-trained approach to reduce latency by identifying parallelizable subtasks and spawning worker threads. Across several benchmarks, DC-CoT achieves comparable accuracy to a long CoT baseline with substantially reduced longest path length.

However, on the Minerva Math benchmark, the authors observe that DC-CoT and DC-CoT-HLP yield worse accuracy than baselines. They speculate that Minerva Math problems may predominantly involve sequential calculations that provide fewer opportunities for parallelization, and thus leave improving DC-CoT's performance on this benchmark to future work.

References

On Minerva Math (MM), DC-CoT and DC-CoT-HLP obtain worse accuracy than the baselines. We speculate that problems in MM involve applying several calculations in a purely sequential manner, making them less amenable to parallelization --- we leave improving DC-CoT's performance on MM to future work.

Divide-and-Conquer CoT: RL for Reducing Latency via Parallel Reasoning  (2601.23027 - Mahankali et al., 30 Jan 2026) in Section 5.2 Results