Efficiency of token-space intermediate-step reasoning

Ascertain whether scaling reasoning computation by producing visible intermediate steps in token space (for example, chain-of-thought prompting, scratchpads, and related techniques) constitutes the most efficient approach for solving reasoning tasks.

Background

The paper contrasts token-space compute-scaling methods—where models emit explicit intermediate reasoning tokens—with latent recursion approaches that iterate in hidden representation space without emitting tokens. While chain-of-thought and related methods have shown strong results, they also generate many tokens that may not contribute directly to reasoning (e.g., grammatical or stylistic tokens). The authors explicitly pose the efficiency of token-space intermediate-step generation as an open question.

This question is central to understanding how best to allocate inference-time computation: whether visible token-level reasoning is a fundamentally efficient pathway or whether alternative approaches (such as latent recursion) might offer superior efficiency by focusing computation on hidden-state refinement.

References

Two key questions remain open: whether this is the most efficient approach, and whether non-reasoning tokens serve any purpose beyond providing additional forward passes for the model to “ponder” before producing an answer.

Tiny Recursive Reasoning with Mamba-2 Attention Hybrid  (2602.12078 - Wang et al., 12 Feb 2026) in Section 1 (Introduction)