Causal role of reasoning bonds in Long CoT learning and why imitation-based distillation fails
Determine whether the three reasoning bond types—Deep-Reasoning, Self-Reflection, and Self-Exploration—causally drive the learning of Long Chain-of-Thought structure in large language models, and explain why explicit human imitation or random in-context-learning-based distillation of these bond markers often fails to induce this structure.
References
However, a key open question remains: do these bonds drive Long CoT structure learning, and if so, why do explicit human imitation or random ICL distillation of these markers often fail?
— The Molecular Structure of Thought: Mapping the Topology of Long Chain-of-Thought Reasoning
(2601.06002 - Chen et al., 9 Jan 2026) in Verification: Molecular Structure — Subsection “SFT actually learns these bond structures rather than keywords.”