Decoupled Planning and Execution: A Hierarchical Reasoning Framework for Deep Search

Published 3 Jul 2025 in cs.AI, cs.CL, and cs.IR | (2507.02652v1)

Abstract: Complex information needs in real-world search scenarios demand deep reasoning and knowledge synthesis across diverse sources, which traditional retrieval-augmented generation (RAG) pipelines struggle to address effectively. Current reasoning-based approaches suffer from a fundamental limitation: they use a single model to handle both high-level planning and detailed execution, leading to inefficient reasoning and limited scalability. In this paper, we introduce HiRA, a hierarchical framework that separates strategic planning from specialized execution. Our approach decomposes complex search tasks into focused subtasks, assigns each subtask to domain-specific agents equipped with external tools and reasoning capabilities, and coordinates the results through a structured integration mechanism. This separation prevents execution details from disrupting high-level reasoning while enabling the system to leverage specialized expertise for different types of information processing. Experiments on four complex, cross-modal deep search benchmarks demonstrate that HiRA significantly outperforms state-of-the-art RAG and agent-based systems. Our results show improvements in both answer quality and system efficiency, highlighting the effectiveness of decoupled planning and execution for multi-step information seeking tasks. Our code is available at https://github.com/ignorejjj/HiRA.

Abstract PDF Upgrade to Chat

Summary

The paper introduces HiRA, a hierarchical framework that decouples high-level planning from low-level execution to overcome limitations in traditional deep search systems.
The framework uses a meta planner, adaptive coordinator, and domain-specialized executors to efficiently manage subtask delegation and reduce reasoning noise.
Experimental results demonstrate significant accuracy and efficiency improvements on complex multi-step tasks, validating the approach's scalability and robustness.

Decoupled Planning and Execution for Deep Search: The HiRA Hierarchical Reasoning Framework

The paper "Decoupled Planning and Execution: A Hierarchical Reasoning Framework for Deep Search" (HiRA) (2507.02652) addresses the limitations of monolithic reasoning architectures in complex information-seeking tasks. The authors propose a hierarchical agentic framework that explicitly separates high-level planning from low-level execution, enabling more scalable, extensible, and efficient deep search systems. This approach is motivated by the observation that existing Retrieval-Augmented Generation (RAG) and agent-based pipelines are constrained by brittle prompt engineering, limited tool extensibility, and context contamination from execution details.

Motivation and Problem Formulation

Traditional RAG and agentic systems typically employ a single large reasoning model to handle both strategic planning (e.g., decomposing a complex query) and execution (e.g., retrieving, summarizing, or computing). This monolithic design leads to two key deficiencies:

Limited Capability Extensibility: Integrating new tools or capabilities requires prompt re-engineering or model retraining, which is brittle and non-scalable.
Reasoning Disruption: Injecting execution results directly into the reasoning chain introduces noise, reducing logical coherence and consuming valuable context window.

The paper formalizes the deep search problem as generating both an answer and a reasoning trace for a complex query, leveraging an environment composed of expert agents, each with specialized reasoning and tool-use capabilities.

HiRA Framework Architecture

HiRA introduces a three-tiered hierarchical architecture:

Meta Reasoning Planner: Decomposes the input query into high-level subtasks, focusing on strategic planning without being encumbered by execution details.
Adaptive Reasoning Coordinator: Assigns subtasks to the most suitable domain-specialized executor, manages bidirectional reasoning transfer, and maintains a dual-channel memory for fact and resource sharing.
Domain-Specialized Executors: Execute assigned subtasks using specialized models and tools (e.g., web search, code execution, multimodal understanding), returning distilled reasoning and results.

This decoupling allows each agent to operate at its optimal abstraction level, with the coordinator ensuring information flow and memory management across the hierarchy.

Key Implementation Details

Subtask Generation: The meta planner emits special tokens to demarcate subtasks, which are then routed by the coordinator.
Agent Selection: The coordinator uses a classification approach, considering required capabilities and task complexity, to select the most efficient executor.
Reasoning Distillation: The coordinator refines and summarizes executor reasoning before reintegrating it into the planner's context, reducing noise.
Dual-Channel Memory: Fact memory stores factual assertions with provenance; resource memory tracks encountered resources, both supporting efficient knowledge transfer and reducing redundant exploration.

Experimental Evaluation

HiRA is evaluated on four challenging, cross-modal deep search benchmarks: GAIA, WebWalkerQA, Humanity's Last Exam, and SimpleQA. The evaluation uses LLM-as-judge for accuracy assessment and compares HiRA against direct reasoning, single-capability enhanced, and multi-capability agentic baselines.

Main Results

Performance: HiRA achieves the highest average accuracy across all datasets, with particularly strong gains on complex, multi-step tasks (e.g., 42.5% on GAIA vs. 36.2% for the strongest baseline, WebThinker).
Efficiency: HiRA demonstrates shorter reasoning chains and fewer environment interactions compared to monolithic agentic baselines, indicating improved inference efficiency.
Ablation Studies: Removing the reasoning transfer mechanism or individual executors (e.g., search, code, multimodal) leads to substantial performance degradation, confirming the necessity of each architectural component.

Notable Claims

Plug-and-Play Capability Integration: HiRA enables direct integration of new expert agents and tools without prompt engineering or retraining, a significant improvement over prior approaches.
Decoupled Planning Generalization: The meta planner's subtask generation is robust even without explicit knowledge of executor capabilities, supporting scalability to new agent types.

Implications and Future Directions

The hierarchical decoupling of planning and execution in HiRA has several important implications:

Scalability: The architecture supports modular addition of new capabilities, facilitating rapid adaptation to emerging tools and domains.
Robustness: By isolating execution details from high-level reasoning, HiRA reduces context contamination and improves logical coherence in multi-step reasoning.
Efficiency: The dual-channel memory and adaptive delegation mechanisms minimize redundant computation and optimize resource usage.

From a theoretical perspective, HiRA aligns with cognitive architectures that separate abstract planning from concrete action, and provides a practical instantiation for LLM-based agentic systems. The empirical results suggest that hierarchical reasoning is a promising direction for advancing deep search and complex information synthesis.

Future work may explore:

Learning-based Coordinator Optimization: Training the coordinator with reinforcement learning or meta-learning to further improve agent selection and reasoning distillation.
Dynamic Agent Pool Expansion: Automated discovery and integration of new expert agents based on task requirements.
Hierarchical Reasoning in Other Domains: Extending the framework to domains such as scientific discovery, legal reasoning, or multi-agent collaboration.

Conclusion

HiRA demonstrates that explicit separation of planning and execution, combined with adaptive coordination and memory mechanisms, yields substantial improvements in both effectiveness and efficiency for deep search tasks. The framework's modularity and extensibility position it as a strong foundation for future research in agentic LLM systems and complex information retrieval.