Papers
Topics
Authors
Recent
Search
2000 character limit reached

Do Latent-CoT Models Think Step-by-Step? A Mechanistic Study on Sequential Reasoning Tasks

Published 31 Jan 2026 in cs.AI and cs.LG | (2602.00449v1)

Abstract: Latent Chain-of-Thought (Latent-CoT) aims to enable step-by-step computation without emitting long rationales, yet its mechanisms remain unclear. We study CODI, a continuous-thought teacher-student distillation model, on strictly sequential polynomial-iteration tasks. Using logit-lens decoding, linear probes, attention analysis, and activation patching, we localize intermediate-state representations and trace their routing to the final readout. On two- and three-hop tasks, CODI forms the full set of bridge states that become decodable across latent-thought positions, while the final input follows a separate near-direct route; predictions arise via late fusion at the end-of-thought boundary. For longer hop lengths, CODI does not reliably execute a full latent rollout, instead exhibiting a partial latent reasoning path that concentrates on late intermediates and fuses them with the last input at the answer readout position. Ablations show that this partial pathway can collapse under regime shifts, including harder optimization. Overall, we delineate when CODI-style latent-CoT yields faithful iterative computation versus compressed or shortcut strategies, and highlight challenges in designing robust latent-CoT objectives for sequential reasoning.

Summary

  • The paper demonstrates that CODI forms explicit intermediate states in its latent channels, with decodability metrics up to 0.709 on two-hop tasks.
  • It employs logit-lens decoding, linear probing, attention analysis, and causal activation patching to uncover the underlying sequential reasoning mechanisms.
  • The study reveals that deeper tasks induce compressed, late-stage strategies, with performance gaps on prime modulus tasks due to strict full-history propagation requirements.

Mechanistic Analysis of Latent Chain-of-Thought Computation in CODI

Overview and Motivation

This paper presents a comprehensive mechanistic analysis of Latent Chain-of-Thought (Latent-CoT) reasoning models, with a particular focus on the CODI (Continuous Chain-of-Thought via Self-Distillation) architecture. The study investigates whether Latent-CoT models solve strictly sequential, multi-hop algorithmic tasks in a genuinely stepwise manner or predominantly rely on shortcut strategies, compressed rollouts, or late-stage fusion of information. The analysis leverages polynomial-iteration tasks over modular arithmetic as a controlled framework, enabling granular examination of intermediate state propagation, representation, and reasoning fidelity. The empirical inquiry synthesizes four mechanistic interpretability techniques: logit-lens decoding, linear probing, attention analysis, and causal activation patching. Figure 1

Figure 1: Mechanistic study of CODI on sequential reasoning tasks. Top: the CODI training setup for polynomial iteration; bottom: four mechanistic interpretability methods used for the analysis.

Experimental Framework and Mechanistic Probing

Task and Model Setup

CODI is trained on sequential polynomial tasks of variable hop-length, where each intermediate state is deterministically defined by modular arithmetic updates:

st=st1xt+b(modm)s_t = s_{t-1} x_t + b \pmod{m}

This setup makes state propagation completely transparent, allowing precise localization of the formation and utilization of intermediate "bridge" states. The teacher model produces explicit, textual Chain-of-Thought traces; the student model learns to predict answers after generating a sequence of latent thought vectors—without emitting textual rationales. The key objective for the student is feature-space self-distillation: minimizing the 1\ell_1 distance between teacher and student hidden states at a designated pre-answer boundary ([Ans]), thus aligning the internal latent computation with explicit stepwise reasoning. Figure 2

Figure 2: Polynomial tasks for training CODI, depicting explicit teacher CoT trace and student latent-thought trajectory. Feature-space distillation aligns internal representations at the [Ans] boundary.

Mechanistic Interpretability Protocols

Four main diagnostic tools are employed:

  • Logit lens: decodes residual-stream activations into probability distributions over task states at each latent position, revealing when and where correct intermediate states emerge.
  • Linear probes: train classifiers on hidden states at various layers/positions to predict intermediate and input states.
  • Attention analysis: visualizes token-to-token routing of information, identifying direct-copy and latent update pathways.
  • Activation patching: causally swaps hidden states at specified positions or tokens between clean and corrupted runs, quantifying necessity and sufficiency for answer prediction.

Empirical Findings: The Structure of Internal Reasoning

Formation, Propagation, and Utilization of Bridge States

On two-hop tasks, logit-lens and probe analyses conclusively show that CODI forms explicit representations of the required intermediate state (s2s_2) inside the latent computation channel; decoding confidence ranges from 0.359 to 0.709 across layers and positions, with linear probe accuracy approaching 1 on these intermediates. Figure 3

Figure 3: Logit lens analysis on intermediate states (s1s_1, s2s_2, s3s_3) for the two-hop task. s2s_2 achieves strong decodability during the latent steps, confirming explicit intermediate state formation.

Figure 4

Figure 4: Linear probing validates that s2s_2 is decodable throughout latent computation, supporting bridge state formation and sequential internal reasoning.

Attention maps further reveal a specialized head configuration in which the final input token (x3x_3) is routed directly to the [Ans] token, rather than being represented or propagated through the latent channel. This creates a bifurcated computational pathway: latent tokens construct and maintain intermediate states; a copy-like mechanism delivers the final input to the output boundary. Figure 5

Figure 5: Attention maps for three-layer, two-head transformer on the two-hop polynomial task. [Ans] token attends strongly to x3x_3, demonstrating a copy-like direct input routing strategy.

Activation patching causally corroborates these findings. Patching clean activations into x2x_2-corrupted runs at latent positions reinstates high accuracy, implicating latent intermediate propagation. Patching for x3x_3-corruptions at Ans uniquely restores performance, consistent with late-stage direct routing. Figure 6

Figure 6

Figure 6: Activation patching for x2x_2-corrupted runs shows substantial accuracy recovery when patching latent thought positions, identifying the latent channel as causally essential for s2s_2.

Depth-Induced Bottleneck and Compression Phenomena

For tasks exceeding three hops, CODI does not reliably instantiate a full sequential latent rollout. Instead, mechanistic analyses show that only the final intermediate states (typically sns_n and sometimes sn1s_{n-1}, sns_n) are strongly decodable, and attention continues to directly route the last input (xn+1x_{n+1}) to [Ans]. Earlier intermediates vanish from the latent trajectory, and activation patching recovery concentrates on late latent tokens and answer boundaries. Figure 7

Figure 7

Figure 7

Figure 7

Figure 7

Figure 7: Logit lens on four-hop task reveals collapse of earlier bridge state decodability; only the final intermediate s4s_4 persists in latent computation.

Figure 8

Figure 8

Figure 8: Five-hop task: mechanistic pattern of partial latent reasoning and late-stage fusion remains, with only terminal intermediates being decodable.

The Prime–Composite Modulus Split and Theoretical Explanation

The mechanistic signature of late bottleneck partial rollouts is robust across composite moduli but fails for prime moduli. When mm is prime, model accuracy drops, and intermediate states are not decodable in the latent stream. Theoretical analysis attributes this to the algebraic structure of the update rule:

  • Composite moduli: update steps can collapse state information (many-to-one contractions), rendering late suffixes of intermediates predictive enough for accurate answers; the label sTs_T depends mainly on a compressed summary of a short terminal suffix.
  • Prime moduli: each update acts as a permutation, preserving complete history dependence; thus, faithful step-by-step computation is required, exceeding the fixed latent computational budget of CODI.

Ablation Studies and Comparison with Baselines

Varying the number of latent steps, model depth/width, and distillation loss confirms the qualitative stability of these mechanistic findings. Teacher-guided compression is identified as the main driver of partial, late-bottleneck strategies under composite moduli. Fully explicit CoT-trained transformers maintain sequential rollouts and outperform both CODI and standard (non-CoT) transformers in the prime modulus regime.

Implications and Future Directions

Practical Implications

Mechanistic findings clarify essential failure modes for CODI-style Latent-CoT models on sequential tasks requiring incompressible, full-history computation. They reveal that latent compute budget and teacher-guided compression can favor shortcut and bottleneck strategies, potentially undermining generalization and faithfulness when information flow cannot be collapsed. These insights stress the need for adaptive allocation of latent resources and for mechanistically robust architectures in domains requiring chain-like reasoning.

Theoretical Implications

The prime–composite modulus split introduces a principled mechanistic probe for reasoning faithfulness versus shortcut induction. Composite-ring tasks analogize naturalistic many-to-one mappings in language and learning, where compression may suffice; prime-modulus tasks demand genuine long-range sequential propagation.

Future Research Directions

  • Generalize mechanistic toolbox to more realistic, non-algorithmic sequential reasoning domains.
  • Explore latent-CoT objectives that adapt latent compute to the compressibility and effective length of underlying computation.
  • Study whether the prime–composite split and compression dependence generalize to alternative architectures and more sophisticated agents.

Conclusion

The paper provides a precise mechanistic account of how CODI-style Latent-CoT models compute on sequential multi-hop tasks. CODI achieves stepwise reasoning only for shallow tasks; deeper tasks induce a drastic collapse into compressed, late-stage bottleneck strategies, particularly when the structure of the problem admits many-to-one contractions. Prime-composite splits highlight the limitation of latent-coT compression: when faithful full-history propagation is indispensable, CODI fails. These results set a foundation for further systematic mechanistic audits of latent reasoning protocols and for the design of more adaptive, robust reasoning architectures. Figure 9

Figure 9: Logit lens on three-hop polynomial task visualizes early formation and maintenance of bridge state s2s_2 in latent channel, but decodability of s3s_3 emerges only late, supporting sequential but depth-limited reasoning.

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We found no open problems mentioned in this paper.

Authors (2)

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 2 tweets with 81 likes about this paper.