- The paper introduces Agentic Reasoning, a framework that integrates web-search, coding, and Mind-Map agents to enhance LLM reasoning in complex tasks.
- It demonstrates state-of-the-art performance on benchmarks like GPQA and GAIA, showcasing improved logical consistency and problem-solving effectiveness.
- Experimental evaluations reveal its applicability in domains such as medical decision-making and strategic reasoning, highlighting its practical impact.
Introduction
The framework of "Agentic Reasoning" is introduced to enhance the reasoning capabilities of LLMs by integrating external tool-using agents. This approach leverages Web Search, code execution, and structured memory to tackle complex questions that necessitate extensive research and reasoning depth. A novel aspect of this framework is the Mind-Map agent, which constructs a structured knowledge graph that preserves and tracks logical relationships, improving coherence in extended reasoning chains where multiple tools are incorporated.
Methodology
Agentic Reasoning Pipeline
The pipeline for Agentic Reasoning integrates external agents into the LLM's reasoning process to enhance its problem-solving capabilities. The LLM can dynamically determine when to invoke external agents, such as the Web-Search and Code agents, as well as the Mind-Map agent, for structured memory storage.
When a specific need arises in the reasoning process, the LLM embeds tasks via special tokens, signaling the necessity for tool-based solutions. For example, web-search tokens prompt external information retrieval, and coding tokens may initiate computational tasks. Each token is accompanied by a generated query message for the respective agent.
Figure 1: The overall workflow of Agentic Reasoning. Given a question, the reasoning LLM can invoke the Web-Search agent to retrieve external information, the Coding agent to perform quantitative computations, and the Mind-Map agent to structurally memorize the reasoning context, to provide a comprehensive solution.
Mind-Map Agent
The Mind-Map agent is designed to manage real-time reasoning contexts through the transformation of reasoning chains into structured knowledge graphs. It is distinguished by its ability to cluster context into groups and summarize them, using community clustering and LLM-based summarization techniques. This organized memory structure enables the model to maintain coherence over long reasoning sequences and serve as a queryable external memory.
Web-Search Agent
The Web-Search agent enhances the LLM by breaking down queries, retrieving relevant information, and ranking web pages. This agent iteratively refines queries and processes relevance feedback, allowing effective integration of external knowledge into the reasoning process.
Coding Agent
The Coding agent manages coding tasks by generating code, executing it, and integrating the outputs back into the reasoning task. This separation allows the LLM to maintain focus on reasoning without direct coding disruptions, promoting task efficiency and coherence.
Experimental Evaluation
Solving Expert-Level Problems
Agentic Reasoning demonstrated superior performance in benchmark tasks, achieving new state-of-the-art results on the GPQA dataset by effectively utilizing integrated reasoning tools.
The method also excelled on the GAIA benchmark, outperforming many proprietary models. A case study in medical decision-making illustrated its capability in automating complex analytical tasks like determining optimal FiO2โ and PEEP for clinical decisions.
Figure 2: Case study on a complex medical decision-making problem.
Deep Research Tasks
Through extensive evaluation on article generation based on the FreshWiki dataset, Agentic Reasoning produced superior performance against established search-enhanced reasoning models, demonstrating the efficacy of structured tool integration in long-form content generation.
Analysis
Agentic Reasoning's success is largely attributed to its adaptive integration of web-search and memory tools, with the Mind-Map agent playing a crucial role in maintaining logical consistency over extended reasoning. The effectiveness was validated in strategic game assessments, such as Werewolf, showcasing enhanced deductive reasoning.
Figure 3: The ablation study examines the impact of different tools in reasoning. Green ones represent external toolboxes, red ones are combinations of our proposed tools. The blue line is the overall performance of the base reasoning model.
Conclusion
Agentic Reasoning provides a highly effective framework for enhancing the reasoning abilities of LLMs, especially in complex problem-solving scenarios. By successfully integrating structured memory and external tools, the approach not only achieves state-of-the-art results but also opens up new avenues for more nuanced task executions. Future work will explore additional task-specific tool integrations to further advance reasoning capabilities in complex, dynamic environments.