Explain the interleaved similarity pattern among PCCoT latent thought tokens and its implications

Determine the underlying cause of the interleaved similarity pattern observed among latent thought tokens in Parallel Continuous Chain-of-Thought (PCCoT) models—where odd-indexed tokens are more similar to each other and even-indexed tokens are more similar to each other—and ascertain what this pattern implies about interdependencies among latent thought tokens and how it potentially affects the scalability of PCCoT to larger models and its extension to more general and complex tasks.

Background

The paper analyzes similarities between latent thought tokens in PCCoT by computing mean squared error (MSE) across token pairs. In trained models, especially with T = 6 extra iterations (and also when increasing iterations in T = 3 settings), an interleaved pattern emerges: odd-indexed latent thought tokens are more similar to each other and less similar to even-indexed tokens, and vice versa. This pattern does not appear in randomly initialized models and is less clear at T = 12.

The authors note they have no clear explanation for this phenomenon and explicitly state uncertainty about what it signifies regarding token interdependencies and how it might impact PCCoT’s scalability and applicability to more complex tasks.

References

Up to now, we have not found a clear explanation for this phenomenon. Perhaps this indicates that the latent thought tokens have some interdependencies, but we still do not know what does this mean and how it may potentially affect PCCoT in terms of scaling up and extending to more general and complex tasks.

Parallel Continuous Chain-of-Thought with Jacobi Iteration  (2506.18582 - Wu et al., 23 Jun 2025) in Appendix B.3 (Similarities between Latent Thought Tokens)