Assess whether the PCCoT student task regularizes the teacher CoT task during training

Investigate whether, during CODI-style joint training of Parallel Continuous Chain-of-Thought (PCCoT), the student task regularizes the teacher chain-of-thought (CoT) task and characterize the mechanism by which such regularization could improve reasoning paths, as suggested by the observation that PCCoT with standard CoT decoding outperforms standard CoT.

Background

In comparisons using GPT-2 Small on GSM8K variants, the authors find counter-intuitive results: (1) PCCoT outperforms PCCoT with standard CoT decoding on GSM8K-NL, and (2) PCCoT with standard CoT decoding outperforms standard CoT. They hypothesize that the student task in PCCoT’s CODI-style training might serve as a regularizer for the teacher CoT task, potentially improving learned reasoning paths.

The authors explicitly express uncertainty about this hypothesis and call for further investigation to understand the phenomenon.

References

For the second observation, it might indicate that during the training of PCCoT, the student task serves as a regularizer for the teacher CoT task and helps the model to learn better reasoning paths. We are not sure about this and further investigation is needed to understand this phenomenon.

Parallel Continuous Chain-of-Thought with Jacobi Iteration  (2506.18582 - Wu et al., 23 Jun 2025) in Appendix B.5 (Comparison with Standard CoT)