Improve DC-CoT performance on Minerva Math
Improve the accuracy of Divide-and-Conquer CoT (DC-CoT) and its high length penalty variant (DC-CoT-HLP) on the Minerva Math benchmark, where current evaluations show both methods underperform relative to DeepScaleR-1.5B-Preview baselines. Specifically, determine training or inference strategies that enable DC-CoT to handle problems that appear to require purely sequential calculations and are less amenable to parallelization, while maintaining reduced longest path length.
References
On Minerva Math (MM), DC-CoT and DC-CoT-HLP obtain worse accuracy than the baselines. We speculate that problems in MM involve applying several calculations in a purely sequential manner, making them less amenable to parallelization --- we leave improving DC-CoT's performance on MM to future work.