Papers
Topics
Authors
Recent
Search
2000 character limit reached

From System 1 to System 2: A Survey of Reasoning Large Language Models

Published 24 Feb 2025 in cs.AI | (2502.17419v6)

Abstract: Achieving human-level intelligence requires refining the transition from the fast, intuitive System 1 to the slower, more deliberate System 2 reasoning. While System 1 excels in quick, heuristic decisions, System 2 relies on logical reasoning for more accurate judgments and reduced biases. Foundational LLMs excel at fast decision-making but lack the depth for complex reasoning, as they have not yet fully embraced the step-by-step analysis characteristic of true System 2 thinking. Recently, reasoning LLMs like OpenAI's o1/o3 and DeepSeek's R1 have demonstrated expert-level performance in fields such as mathematics and coding, closely mimicking the deliberate reasoning of System 2 and showcasing human-like cognitive abilities. This survey begins with a brief overview of the progress in foundational LLMs and the early development of System 2 technologies, exploring how their combination has paved the way for reasoning LLMs. Next, we discuss how to construct reasoning LLMs, analyzing their features, the core methods enabling advanced reasoning, and the evolution of various reasoning LLMs. Additionally, we provide an overview of reasoning benchmarks, offering an in-depth comparison of the performance of representative reasoning LLMs. Finally, we explore promising directions for advancing reasoning LLMs and maintain a real-time \href{https://github.com/zzli2022/Awesome-Slow-Reason-System}{GitHub Repository} to track the latest developments. We hope this survey will serve as a valuable resource to inspire innovation and drive progress in this rapidly evolving field.

Summary

  • The paper reveals that integrating structured search with reward modeling significantly enhances LLM reasoning, enabling deliberate System 2 processing.
  • It employs techniques like Monte Carlo Tree Search and symbolic logic to improve intermediate reasoning steps and overall solution accuracy.
  • The survey benchmarks reasoning across tasks such as mathematics and coding, highlighting both outcome efficiency and process clarity.

From System 1 to System 2: A Survey of Reasoning LLMs

Introduction to Reasoning LLMs

The progression from System 1 to System 2 models in AI highlights the transition from fast, intuitive decision-making to complex, deliberate reasoning. LLMs employing System 1 reasoning possess strong capabilities in generating human-like text and performing straightforward tasks with minimal processing time. However, they falter when deeper logical analysis or step-by-step reasoning, characteristics of System 2, are requisite. Recent advancements in reasoning LLMs, exemplified by models such as OpenAI's o1/o3 and DeepSeek's R1, reflect an enhanced ability to mimic deliberate logical reasoning processes akin to System 2 thinking, accomplishing tasks in areas such as mathematics and programming with a high degree of precision.

Development of Reasoning LLMs

Reasoning LLMs build upon foundational models through integrating structured search mechanisms and reward feedback. The integration of symbolic logic systems, reinforcement learning (RL), and Monte Carlo Tree Search (MCTS) with LLMs has significantly accelerated their ability to reason. Symbolic systems provide a rule-based framework useful for precise reasoning which, when merged with LLMs, can simulate intricate cognitive pathways. Figure 1

Figure 1: A comprehensive comparison of traditional reasoning models and reasoning LLMs. Reasoning LLMs offer significant advantages over traditional models in areas such as training approaches, adaptability and learning, problem-solving strategies, and generality and scalability.

MCTS, traditionally used in decision-making for extensive search trees, complements LLMs by iteratively simulating possible outcomes and refining strategies, thereby optimizing decision paths based on higher-order reasoning. This integration is crucial for emulating human-like strategic planning and avoiding suboptimal reasoning outcomes.

Core Methods for Enabling Reasoning LLMs

A well-defined set of core methods underpins the functioning of advanced reasoning LLMs, including structured search, reward modeling, self-improvement, and macro actions. Figure 2

Figure 2: The core methods enabling reasoning LLMs.

In the structured search, reasoning pathways are explored through methods like MCTS to identify optimal reasoning trajectories by evaluating and selecting from amongst a multitude of possible intermediate states. This approach is particularly beneficial for tackling complex problems requiring multifaceted exploration.

Reward modeling is critical, and it comes in two major forms: Outcome Reward Model (ORM) and Process Reward Model (PRM). ORM provides a binary reward based on the correctness of the final answer, whereas PRM evaluates intermediate steps, promoting a more comprehensive understanding of the solution process. Figure 3

Figure 3: The comparison between ORM and PRM for assessing a complete solution trajectory. ORM only provides a single reward based on the correctness of the final answer, while PRM evaluates the quality of each reasoning step throughout the process.

Current Benchmarks and Evaluation

Current benchmarks for reasoning LLMs span various domains, including mathematical reasoning, coding tasks, and multimodal understanding. They evaluate not only the accuracy but also the processes involved in arriving at answers. Metrics specific to reasoning LLMs focus on both outcome efficiency—how effectively the model arrives at correct solutions—and process efficiency—the diversity and coherence of reasoning pathways explored. Figure 4

Figure 4: Various evaluation metrics of reasoning LLMs divided by task types, technical proposals, and reasoning paradigms.

Challenges and Future Directions

Reasoning LLMs still face challenges including the integration of flexible, adaptive reasoning structures that can efficiently switch between fast and slow cognitive processes. The need to advance scalable, real-time reasoning capability while maintaining high performance remains pertinent.

Moreover, the potential lies in the development of models capable of combining symbolic reasoning with neural network patterns effectively, thereby allowing these models to operate efficiently in low-resource settings and across languages.

Conclusion

The transition from System 1 to System 2 LLMs signifies a pivotal advancement in AI capable of human-like reasoning processes. While reasoning LLMs have made significant strides in emulating deep, logical analysis akin to human cognition, ongoing challenges in efficiency, adaptability, and integration continue. Future research directions must focus on optimizing resource consumption and enhancing the ability to generalize across domain-specific tasks, pushing the boundaries of automated reasoning systems further.

Paper to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Explain it Like I'm 14

What this paper is about (in simple terms)

This paper is a big, easy-to-read map of how today’s AI moves from “fast thinking” to “slow thinking.” Fast thinking (called System 1) is quick and intuitive—good for simple questions. Slow thinking (System 2) is careful, step-by-step reasoning—good for hard problems like tricky math, writing code, planning, and medical questions. The authors explain how new “reasoning” LLMs (like OpenAI’s o1/o3 and DeepSeek’s R1) are getting better at slow thinking, how they’re built, how we test them, what works, what doesn’t, and where the field is going next.

The main questions the paper answers

The paper organizes the field and answers a few simple questions:

  • What makes a “reasoning LLM” different from older, fast-thinking LLMs?
  • Which ideas and tools help models think step by step (like a human solving a puzzle)?
  • How do we actually build and train these models?
  • How do we measure if they’re good at reasoning?
  • What are the current limits, and what should researchers do next?

How the research is done and the key ideas (explained with everyday examples)

This is a survey, so the authors read many papers, compare them, and explain the big patterns. To make the topic clear, they start with the basics and then show how the parts fit together.

Here are the core ideas, with simple analogies:

  • System 1 vs System 2
    • System 1: like answering “What’s 2+2?”—instant and automatic.
    • System 2: like solving a long algebra problem—slow, step-by-step, checking your work.
  • Foundational LLMs vs Reasoning LLMs
    • Foundational LLMs are great chatters—fast and fluent—but they can struggle with long, logical problems.
    • Reasoning LLMs try to “think out loud,” exploring different paths, checking steps, and correcting mistakes.
  • Symbolic logic (old-school AI rules)
    • Think of this as if-then rules or a checklist: “If X, then do Y.” It’s strict and structured. Today’s models borrow the idea of making high-level plans (“macro actions”), like a checklist: “First define the variables, then factor, then solve.”
  • Monte Carlo Tree Search (MCTS)
    • Like playing chess in your head: try a move, imagine what happens next, keep the good paths, drop the bad ones. For AI reasoning, MCTS helps the model explore many solution paths and pick the best.
  • Reinforcement Learning (RL)
    • Learning by trial-and-error with points. Do something good? Get a reward. Do something bad? Learn to avoid it. Famous examples: AlphaGo and AlphaZero learned board games by practicing against themselves.
  • Structure search (finding a good path to the answer)
    • The model builds a “tree” of possible steps (like exploring all routes in a maze), then chooses the promising route. MCTS is a popular way to do this.
  • Reward modeling (how to judge what’s “good thinking”)
    • Two styles:
    • Outcome Reward Model (ORM): Only grades the final answer. Like a teacher who only checks if the last answer is right.
    • Process Reward Model (PRM): Grades each step. Like a teacher who checks your work line by line, so you learn where you went wrong. This helps models fix mistakes earlier.
  • Self-improvement
    • The model learns from its own attempts, refines its steps, and improves—similar to practicing a sport and reviewing your plays.
  • Macro actions (high-level moves)
    • Instead of typing every tiny step, the model uses smart shortcuts like “Summarize the problem,” “Try an example,” or “Check for mistakes.” This is like using a recipe’s main steps instead of describing every motion of your hands.
  • Reinforcement fine-tuning
    • Use RL to tune the model so it prefers helpful, accurate, step-by-step reasoning, not just fast guessing.

What the paper finds and why it matters

The authors summarize trends seen across many recent models:

  • Reasoning models explore more and double-check themselves
    • They often include mini-moves like “Wait,” “Hold on,” or “Let’s check,” which help catch mistakes.
    • They try multiple solution paths before deciding, like a careful student.
  • They write longer answers and take more time
    • For hard math and code, they may generate thousands of tokens (lots of text) to reason things out.
    • This helps on tough problems but can lead to “overthinking” on easy ones.
  • They can be oddly cautious on simple tasks
    • Even easy questions can trigger too much step-by-step thinking, which wastes time.
  • Training can be surprisingly data-efficient—if the data is chosen well
    • A small number of high-quality, hard, “think-aloud” examples can teach a lot.
    • Sparse or simple reward signals can still work if the learning process is well designed.
  • Bigger models benefit more
    • Larger models handle complex, multi-step reasoning better and gain more from these techniques.
  • MCTS helps—but it’s expensive
    • Searching many paths boosts accuracy, but it can be slow and require lots of compute power.
  • Grading steps (PRM) beats grading only the final answer (ORM) for complex problems
    • PRM helps the model learn where and why it went wrong, making its reasoning clearer and more reliable.
  • Benchmarks show real gains
    • New reasoning LLMs reach strong results in math, code, multimodal tasks (mixing text and images), medicine, and more, often matching or approaching expert level on tough tests.

What this could change in the real world

If these ideas keep improving, we can expect:

  • Better problem-solving tools
    • Tutors that explain every step, coding assistants that debug and justify fixes, and medical AIs that reason carefully with evidence.
  • More trustworthy AI
    • Models that “show their work” are easier to trust and easier to fix when wrong.
  • Smarter use of compute
    • We need to balance deep thinking (slow but accurate) with speed (fast but riskier), especially in real-time applications.
  • New research directions
    • Design better rewards that match human reasoning.
    • Make search (like MCTS) cheaper and faster.
    • Build stronger multimodal reasoners (text, images, charts).
    • Reduce overthinking on easy tasks.
    • Improve safety and reduce bias using step-by-step checks.

In short, this survey shows how AI is moving from quick guesses to careful thinking. By combining planning (MCTS), trial-and-error learning (RL), step-by-step grading (PRM), and smart high-level moves (macro actions), reasoning LLMs are getting closer to human-like problem solving. The authors also provide a live GitHub resource to track new advances, making this a helpful guide for anyone building or evaluating the next generation of reasoning AI.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 6 tweets with 132 likes about this paper.