- The paper presents HALO, a three-layer hierarchical framework that improves multi-agent LLM collaboration and task decomposition.
- It employs adaptive prompt refinement and Monte Carlo Tree Search to dynamically optimize agent roles and reasoning workflows.
- Empirical evaluations show a 14.4% performance boost over benchmarks like MMLU and MATH, demonstrating its effectiveness in complex scenarios.
HALO: Hierarchical Autonomous Logic-Oriented Orchestration for Multi-Agent LLM Systems
The paper presents HALO, a framework designed to address the limitations of existing Multi-Agent Systems (MAS) powered by LLMs in complex environments and expert-level tasks. HALO is structured around a three-layer hierarchical reasoning architecture focused on adaptive role instantiation, dynamic workflows, and refined prompt engineering.
Introduction
HALO is introduced as a response to shortcomings observed in fixed role spaces and static communication structures of current MAS. These systems often underperform in complex and specialized tasks due to their rigidity and reliance on expert design insights. HALO seeks to improve adaptability and flexibility through dynamic role design and hierarchical orchestration.
Figure 1: The overview of HALO framework. HALO consists of three modules: (1) Adaptive Prompt Refinement (Section~\ref{sec_3.2_prompt_refinement), (2) Hierarchical Reasoning Stack, and (3) Workflow Search Engine.
HALO Framework Components
Adaptive Prompt Refinement
The first stage focuses on refining user queries via adaptive prompt engineering to produce structured, LLM-comprehensible prompts. This module leverages four specialized agents—Task Parser, Prompt Template, Prompt Optimization, and Prompt Generator—to systematically convert raw input into actionable prompts that facilitate clear reasoning pathways.
Figure 2: System prompts used in the Adaptive Prompt Refinement module.
Hierarchical Reasoning Stack
This component structures agent collaboration into high-level planning for task decomposition, mid-level role-design for agent instantiation, and low-level inference for subtask execution. Utilizing a hierarchical architecture helps distribute cognitive load among agents, enhancing task-specific reasoning and ensuring effective collaboration through dynamic role generation aligned to task demands.
Workflow Search Engine
HALO employs Monte Carlo Tree Search (MCTS) to methodically explore the agentic action space and construct optimal reasoning trajectories. The Workflow Search Engine module dynamically adapts the reasoning process based on real-time feedback, allowing HALO to explore numerous action possibilities through simulation and backpropagation stages.
Figure 3: The illustration of how Monte Carlo Tree Search (MCTS) guides multi-agent reasoning through selection, expansion, simulation, and backpropagation stages.
Experimental Evaluation
Empirical evaluations demonstrate HALO's prowess, with an average improvement of 14.4% over established benchmarks such as HumanEval (Code Generation), MMLU (General Reasoning), and MATH (Arithmetic Reasoning). Specifically, HALO showed exceptional performance gains on challenging subdomains within the MMLU and MATH datasets, indicating its utility in specialized task management.

Figure 4: Performance comparison on three computationally intensive subareas selected from the MATH dataset.
Ablation Study Results
An ablation study highlights the critical role of each HALO component. Removing the Adaptive Prompt Refinement led to significant performance degradation, underscoring the necessity of structured prompt engineering. Similarly, eliminating the high-level planning agent impaired reasoning coherence and overall task performance.
Figure 5: System prompts for the high-Level planning agent and Workflow Search Engine module.
Conclusion
HALO marks a significant advancement in multi-agent LLM systems by introducing a flexible, hierarchical framework that overcomes cognitive overload and rigid system architectures. The integration of adaptive prompt refinement and MCTS-based exploration aligns roles and workflows with dynamic task demands, facilitating superior performance in complex interaction environments. Future avenues may include enhancements with memory mechanisms and knowledge integration to further improve capabilities.
The HALO framework and its implementation are available here.