Assess whether the PCCoT student task regularizes the teacher CoT task during training
Investigate whether, during CODI-style joint training of Parallel Continuous Chain-of-Thought (PCCoT), the student task regularizes the teacher chain-of-thought (CoT) task and characterize the mechanism by which such regularization could improve reasoning paths, as suggested by the observation that PCCoT with standard CoT decoding outperforms standard CoT.
References
For the second observation, it might indicate that during the training of PCCoT, the student task serves as a regularizer for the teacher CoT task and helps the model to learn better reasoning paths. We are not sure about this and further investigation is needed to understand this phenomenon.
— Parallel Continuous Chain-of-Thought with Jacobi Iteration
(2506.18582 - Wu et al., 23 Jun 2025) in Appendix B.5 (Comparison with Standard CoT)