Reasoning Model is Stubborn: Diagnosing Instruction Overriding in Reasoning Models

Published 22 May 2025 in cs.AI | (2505.17225v1)

Abstract: LLMs have demonstrated remarkable proficiency in long and complex reasoning tasks. However, they frequently exhibit a problematic reliance on familiar reasoning patterns, a phenomenon we term \textit{reasoning rigidity}. Despite explicit instructions from users, these models often override clearly stated conditions and default to habitual reasoning trajectories, leading to incorrect conclusions. This behavior presents significant challenges, particularly in domains such as mathematics and logic puzzle, where precise adherence to specified constraints is critical. To systematically investigate reasoning rigidity, a behavior largely unexplored in prior work, we introduce a expert-curated diagnostic set, \dataset{}. Our dataset includes specially modified variants of existing mathematical benchmarks, namely AIME and MATH500, as well as well-known puzzles deliberately redesigned to require deviation from familiar reasoning strategies. Using this dataset, we identify recurring contamination patterns that occur when models default to ingrained reasoning. Specifically, we categorize this contamination into three distinctive modes: (i) Interpretation Overload, (ii) Input Distrust, and (iii) Partial Instruction Attention, each causing models to ignore or distort provided instructions. We publicly release our diagnostic set to facilitate future research on mitigating reasoning rigidity in LLMs.

Abstract PDF Upgrade to Chat

Summary

The paper introduces novel diagnostic datasets, ConditionedMath and PuzzleTrivial, to assess and quantify reasoning rigidity in LLMs.
It proposes a contamination ratio metric and an early detection algorithm to measure deviations from intended reasoning paths.
Experimental results reveal that base models often exceed reasoning-optimized ones, underscoring challenges in adapting to uncommon instructions.

Diagnosing Instruction Overriding in Reasoning Models

The paper "Reasoning Model is Stubborn: Diagnosing Instruction Overriding in Reasoning Models" explores the phenomenon of reasoning rigidity exhibited by LLMs specifically trained for complex reasoning tasks, such as mathematics and logic puzzles. Reasoning rigidity refers to the tendency of these models to default to familiar solution templates, thereby overriding specific user instructions even when the conditions are fully understood.

Introduction

LLMs have been applied to a variety of complex problem-solving tasks, including mathematical reasoning and puzzle-solving. However, despite their advanced capabilities, these models frequently exhibit a form of cognitive bias known as reasoning rigidity. This bias causes the models to ignore or override explicit conditions in favor of well-trodden reasoning paths, leading to erroneous conclusions.

The challenge posed by reasoning rigidity is distinct from hallucinations or prompt brittleness, as it relates to a model's failure to adhere to unfamiliar or atypical conditions. To study this behavior, the authors introduce a diagnostic set consisting of modified mathematical and puzzle-based benchmarks that require models to deviate from familiar reasoning strategies.

Diagnostic Dataset: ConditionedMath and PuzzleTrivial

The authors present two specialized datasets designed to evaluate reasoning rigidity:

ConditionedMath: This dataset is derived from existing mathematical benchmarks like AIME and MATH500. It consists of problems with carefully introduced variations that require unfamiliar solution trajectories (Figure 1). This challenges the models to detect and adapt to new constraints rather than relying on preconceived reasoning patterns.
PuzzleTrivial: This dataset includes logic puzzles that have been subtly altered to simplify the required reasoning or to introduce new constraints. The modifications are intended to evaluate the models' ability to follow straightforward logic rather than revert to familiar templates.
Figure 1: Dataset Construction Pipeline of ConditionedMath, highlighting two steps to create valid, meaningfully different, and solvable questions.

Analysis of Reasoning Patterns

Through systematic evaluation using the introduced datasets, the authors identify three modes of contamination in model reasoning:

Interpretation Overload: Models reinterpret straightforward instructions in multiple ways, leading to contradictions.
Input Distrust: Models assume errors in the input, complicating the reasoning process unnecessarily.
Partial Instruction Attention: Models selectively attend to parts of the instruction, often ignoring key elements.

These modes cause deviations from the intended reasoning paths, resulting in failures to comply with user-specified constraints.

Contamination Ratio and Detection

The authors propose a metric called the contamination ratio to quantify the extent of reasoning contamination. This ratio measures the proportion of reasoning steps in which the model defaults to familiar patterns over the provided instructions. Additionally, they introduce an early detection algorithm to facilitate the identification of contamination in cases where ground-truth labels are unavailable (Figure 2).

Figure 2: Patterns associated with contamination ratio, indicating how contamination affects reasoning accuracy.

Experiments and Observations

Experimental results indicate that base models often outperform their reasoning-optimized counterparts on modified datasets. This counterintuitive outcome suggests that reasoning models are more susceptible to rigidity. Base models, in contrast, are better at adhering to explicit instructions when the initial understanding of the problem is correct.

The authors also explore mitigation strategies such as budget forcing and prompt hinting but observe mixed effectiveness. While some adjustments yield performance improvements, others do not consistently overcome the rigidity issue.

Conclusion

The study underscores the need for improved understanding and mitigation of reasoning rigidity in LLMs. The diagnostic datasets and tools introduced provide a valuable foundation for future research aimed at enhancing model adherence to user instructions. The findings highlight the complexity of reinforcing reasoning capabilities in AI systems and emphasize the necessity for more robust training paradigms.

Developing models that can reliably deviate from familiar patterns when warranted remains a critical goal. Continued investigation into the cognitive biases manifest in model reasoning will be essential for advancing AI capabilities in domains requiring precise adherence to diverse and dynamic constraints.

Markdown Report Issue