Wait, Wait, Wait... Why Do Reasoning Models Loop?

Published 15 Dec 2025 in cs.LG | (2512.12895v1)

Abstract: Reasoning models (e.g., DeepSeek-R1) generate long chains of thought to solve harder problems, but they often loop, repeating the same text at low temperatures or with greedy decoding. We study why this happens and what role temperature plays. With open reasoning models, we find that looping is common at low temperature. Larger models tend to loop less, and distilled students loop significantly even when their teachers rarely do. This points to mismatches between the training distribution and the learned model, which we refer to as errors in learning, as a key cause. To understand how such errors cause loops, we introduce a synthetic graph reasoning task and demonstrate two mechanisms. First, risk aversion caused by hardness of learning: when the correct progress-making action is hard to learn but an easy cyclic action is available, the model puts relatively more probability on the cyclic action and gets stuck. Second, even when there is no hardness, Transformers show an inductive bias toward temporally correlated errors, so the same few actions keep being chosen and loops appear. Higher temperature reduces looping by promoting exploration, but it does not fix the errors in learning, so generations remain much longer than necessary at high temperature; in this sense, temperature is a stopgap rather than a holistic solution. We end with a discussion of training-time interventions aimed at directly reducing errors in learning.

Abstract PDF Upgrade to Chat

Summary

The paper identifies how model configurations such as temperature and size affect the prevalence of looping in reasoning tasks.
The study employs evaluations on AIME tasks and synthetic graph tasks, revealing that larger and non-distilled models exhibit reduced looping.
The authors propose future strategies including data-centric training alterations and advanced architectures to mitigate looping in AI reasoning systems.

Analysis of "Wait, Wait, Wait... Why Do Reasoning Models Loop?"

The paper "Wait, Wait, Wait... Why Do Reasoning Models Loop?" (2512.12895) presents an in-depth analysis of the phenomenon where reasoning models, such as those used in structured AI tasks, often exhibit repetitive behavior, known as "looping." The authors study this occurrence, its causes, and the role of model configurations, especially focusing on temperature and model size, in understanding and addressing looping.

Introduction to Looping in Reasoning Models

The paper begins by setting the context of reasoning models generating extensive sequences of logical steps to tackle challenging problems. Such models, however, frequently fall into loops, continuously repeating sections of their text outputs, especially under conditions of low temperature or greedy decoding methodologies. This characteristic is prevalent in several LLMs and is notably apparent in tasks requiring sustained logical reasoning such as competitive mathematics or coding challenges. The authors explore potential causes and solutions for such behavior by analyzing different model families and emphasizing the importance of learning dynamics within models and the mismatches that lead to these repetitive sequences.

Key Observations and Findings

Model Evaluation and Looping Trends

The authors evaluate a selection of open reasoning models, including DeepSeek-R1, Qwen, and OpenThinker, using tasks from the American Invitational Mathematics Examination (AIME). They draw significant insights regarding looping behavior:

Model Size and Looping: Generally, larger models loop less in comparison to their smaller counterparts (e.g., Qwen 1.5B > 7B > 32B), indicating a correlation between model capacity and looping.
Impact of Distillation: Distilled student models tend to loop significantly more than their undistilled parents, highlighting transfer inefficiencies during the distillation process (e.g., OpenThinker3 loops more than QwQ-32B).
Problem Difficulty and Looping: More complex problems tend to elicit increased looping, suggesting that problem difficulty is a key driver of repetitive model behavior (Figure 1).
Figure 1: Looping with greedy decoding. Bars show the looping fractions at temperature 0, averaged over AIME 2024 and 2025. All reasoning models exhibit looping, and within each family larger models loop less. Distilled students can loop significantly more than their teacher.

Mechanisms Behind Looping

The research introduces two primary mechanisms to explain looping within reasoning models:

Risk Aversion from Learning Hardness: Actions within reasoning tasks, which are difficult to learn, might cause the model to gravitate towards easier, repeatable actions. This, in turn, establishes a cycle leading to looping when such repetitive actions become favored due to their simplicity relative to the harder, less distinguishable actions.
Inductive Bias Toward Temporal Errors: Transformers exhibit a bias where errors in action prediction persist over multiple steps due to temporal correlations. This bias leads models into loops, especially in scenarios where each step builds upon previous errors, reinforcing the loop.

A synthetic graph reasoning task is employed to visualize and reinforce these claims, demonstrating how such mechanisms naturally result in looping.

The Role of Temperature

Temperature plays a dual role when addressing looping:

Increasing the temperature of sampling during model inference can reduce the likelihood of looping by promoting exploration over exploitation of known sequences (i.e., reducing reliance on high-probability actions which could be repetitive).
However, it does not directly address underlying model imperfections or biases, rendering higher temperatures an incomplete solution rather than a definitive fix.

Mitigations and Future Directions

The paper concludes by proposing several avenues to explore further mitigation strategies:

Data-Centric Training Alterations: Modifying training processes to enhance the representation of difficult tasks or injecting augmentations aimed at reducing cycle-causing mispredictions.
Advanced Architectures: Consideration of architectures that inherently counteract the accumulation of correlated errors, drawing from recent advances in network structures.

The authors also underscore the need for realistic solutions that incorporate holistic changes at both training and inference levels.

Conclusion

"Wait, Wait, Wait... Why Do Reasoning Models Loop?" delivers an insightful examination of loop mechanisms in reasoning models. While increased model temperatures can mitigate looping symptoms, the root causes often reside in fundamental training inefficiencies and network biases. Through this analysis, the paper opens the floor for continued exploration into more sophisticated training strategies and model architectures to further advance the field of AI reasoning.