Mechanism behind low chain-of-thought controllability

Identify and characterize the causal mechanisms in training and inference that lead contemporary reasoning models to exhibit low chain-of-thought controllability relative to output controllability.

Background

Across extensive evaluations, the paper finds that models are far less able to control their CoT than their final outputs, and that controllability further decreases with longer reasoning, more RL training, and task difficulty.

Despite these empirical patterns, the authors state that the underlying mechanism causing low CoT controllability is not yet understood, motivating investigation into factors such as training objectives, process supervision, and model architecture.

References

However, the mechanism behind low controllability is not well understood.

Reasoning Models Struggle to Control their Chains of Thought  (2603.05706 - Yueh-Han et al., 5 Mar 2026) in Abstract