HALO: Hierarchical Autonomous Logic-Oriented Orchestration for Multi-Agent LLM Systems

Published 17 May 2025 in cs.MA and cs.AI | (2505.13516v1)

Abstract: Recent advancements in Multi-Agent Systems (MAS) powered by LLMs have demonstrated tremendous potential in diverse task scenarios. Nonetheless, existing agentic systems typically rely on predefined agent-role design spaces and static communication structures, limiting their adaptability as well as flexibility in complex interaction environments and leading to subpar performance on highly specialized and expert-level tasks. To address these issues, we introduce HALO, a multi-agent collaboration framework based on a hierarchical reasoning architecture. Specifically, we incorporate a high-level planning agent for task decomposition, mid-level role-design agents for subtask-specific agent instantiation, and low-level inference agents for subtask execution. Particularly, subtask execution is reformulated as a structured workflow search problem, where Monte Carlo Tree Search (MCTS) systematically explores the agentic action space to construct optimal reasoning trajectories. Additionally, as the majority of users lack expertise in prompt engineering, we leverage an Adaptive Prompt Refinement module to transform raw queries into task-specific prompts. Empirical evaluations on Code Generation (HumanEval), General Reasoning (MMLU), and Arithmetic Reasoning (MATH) benchmark datasets highlight the effectiveness of HALO, yielding a 14.4% average improvement over state-of-the-art baselines. Notably, HALO achieves up to 13.3% performance gain on the Moral Scenarios subject in the MMLU benchmark and up to 19.6% performance gain on the Algebra subarea in the MATH benchmark, indicating its advanced proficiency in tackling highly specialized and expert-level tasks. The code repository is available at https://github.com/23japhone/HALO.

Abstract PDF Upgrade to Chat

Summary

The paper presents HALO, a three-layer hierarchical framework that improves multi-agent LLM collaboration and task decomposition.
It employs adaptive prompt refinement and Monte Carlo Tree Search to dynamically optimize agent roles and reasoning workflows.
Empirical evaluations show a 14.4% performance boost over benchmarks like MMLU and MATH, demonstrating its effectiveness in complex scenarios.

HALO: Hierarchical Autonomous Logic-Oriented Orchestration for Multi-Agent LLM Systems

The paper presents HALO, a framework designed to address the limitations of existing Multi-Agent Systems (MAS) powered by LLMs in complex environments and expert-level tasks. HALO is structured around a three-layer hierarchical reasoning architecture focused on adaptive role instantiation, dynamic workflows, and refined prompt engineering.

Introduction

HALO is introduced as a response to shortcomings observed in fixed role spaces and static communication structures of current MAS. These systems often underperform in complex and specialized tasks due to their rigidity and reliance on expert design insights. HALO seeks to improve adaptability and flexibility through dynamic role design and hierarchical orchestration.

Figure 1: The overview of HALO framework. HALO consists of three modules: (1) Adaptive Prompt Refinement (Section~\ref{sec_3.2_prompt_refinement), (2) Hierarchical Reasoning Stack, and (3) Workflow Search Engine.

HALO Framework Components

The first stage focuses on refining user queries via adaptive prompt engineering to produce structured, LLM-comprehensible prompts. This module leverages four specialized agents—Task Parser, Prompt Template, Prompt Optimization, and Prompt Generator—to systematically convert raw input into actionable prompts that facilitate clear reasoning pathways.

Figure 2: System prompts used in the Adaptive Prompt Refinement module.

Hierarchical Reasoning Stack

This component structures agent collaboration into high-level planning for task decomposition, mid-level role-design for agent instantiation, and low-level inference for subtask execution. Utilizing a hierarchical architecture helps distribute cognitive load among agents, enhancing task-specific reasoning and ensuring effective collaboration through dynamic role generation aligned to task demands.

Workflow Search Engine

HALO employs Monte Carlo Tree Search (MCTS) to methodically explore the agentic action space and construct optimal reasoning trajectories. The Workflow Search Engine module dynamically adapts the reasoning process based on real-time feedback, allowing HALO to explore numerous action possibilities through simulation and backpropagation stages.

Figure 3: The illustration of how Monte Carlo Tree Search (MCTS) guides multi-agent reasoning through selection, expansion, simulation, and backpropagation stages.

Experimental Evaluation

Empirical evaluations demonstrate HALO's prowess, with an average improvement of 14.4% over established benchmarks such as HumanEval (Code Generation), MMLU (General Reasoning), and MATH (Arithmetic Reasoning). Specifically, HALO showed exceptional performance gains on challenging subdomains within the MMLU and MATH datasets, indicating its utility in specialized task management.

Figure 4: Performance comparison on three computationally intensive subareas selected from the MATH dataset.

Ablation Study Results

An ablation study highlights the critical role of each HALO component. Removing the Adaptive Prompt Refinement led to significant performance degradation, underscoring the necessity of structured prompt engineering. Similarly, eliminating the high-level planning agent impaired reasoning coherence and overall task performance.

Figure 5: System prompts for the high-Level planning agent and Workflow Search Engine module.

Conclusion

HALO marks a significant advancement in multi-agent LLM systems by introducing a flexible, hierarchical framework that overcomes cognitive overload and rigid system architectures. The integration of adaptive prompt refinement and MCTS-based exploration aligns roles and workflows with dynamic task demands, facilitating superior performance in complex interaction environments. Future avenues may include enhancements with memory mechanisms and knowledge integration to further improve capabilities.

The HALO framework and its implementation are available here.

Markdown