Optimal reasoning-trace length for the computational buffer effect

Determine the optimal length of the generated reasoning trace (i.e., the number of thinking tokens produced before the final answer) that maximizes accuracy improvements attributable to the computational buffer effect—where the computational buffer effect refers to using additional generated tokens to perform latent computation independent of their semantic content—in reasoning-enabled large language models on closed-book factual question answering tasks.

Background

The paper investigates why enabling reasoning improves parametric knowledge recall for simple, single-hop factual questions. One identified mechanism is a content-independent computational buffer effect: models use additional generated tokens to perform latent computation before producing an answer, which improves recall even when the reasoning content is replaced with semantically meaningless filler text.

However, experiments show a non-monotonic relationship between the amount of such extra computation and performance: increasing the number of dummy reasoning tokens initially helps but then saturates or becomes counterproductive, and compute alone never fully matches full reasoning performance. Consequently, while additional computation can be beneficial, longer traces are not consistently better, leaving the optimal length unresolved for practical use.

References

The computational buffer effect does not provide a reliable control signal, as longer traces are not consistently better and the optimal length is unknown.

Thinking to Recall: How Reasoning Unlocks Parametric Knowledge in LLMs  (2603.09906 - Gekhman et al., 10 Mar 2026) in Section “From Analysis to Practice: Improving Accuracy by Sampling High-Potential Traces” (\S 5.4)