Multi-Agent System Search

Updated 19 February 2026

Multi-Agent System Search (MASS) is an algorithmic framework that integrates structured agent coordination, prompt and topology optimization, and learning-based inference to tackle heterogeneous problems.
MASS employs methodologies like neural orchestration, beam search/MCTS, and grammar-based topology search to enhance decision-making accuracy, scalability, and computational efficiency.
Practical applications of MASS span mathematical reasoning, web search, materials design, and robotics, demonstrating significant improvements in task performance and system interpretability.

A multi-agent system search (MASS) refers to algorithmic frameworks, orchestration architectures, and learning-based optimization approaches designed for the synthesis, coordination, or search over multi-agent systems in order to efficiently solve complex, heterogeneous, or large-scale problems. MASS combines structured agent interaction protocols, search algorithms over system architectures or behaviors, and supervision/evaluation mechanisms, targeting both the design-time and inference-time dimensions of multi-agent systems. This encompasses the optimization of agent prompts and topologies, neural orchestration for agent selection, reinforcement and reward-guided inference search, and grammar-based or MapReduce-style structure searches. Contemporary research demonstrates that MASS is pivotal for boosting task performance, scalability, and interpretability in domains ranging from reasoning and QA to physical planning and materials discovery.

1. Formal Problem Definitions and Core Principles

MASS encapsulates several problem settings, including:

Architecture/Workflow Search: Searching over the space of possible agent role assignments, topologies, and prompt configurations to maximize downstream metrics (e.g., accuracy, F1, cost-effectiveness). The space combines prompts $\mathcal{P}$ and topologies $\mathcal{T}$ , with an objective

$\max_{P,T}\ f(P,T) := \mathbb{E}_{(x,y)\sim \mathcal{D}}[\mathrm{score}(W(P,T)(x), y)]$

where $W(P,T)$ specifies the MAS workflow (Zhou et al., 4 Feb 2025).

Neural Orchestration/Agent Selection: Given a set of heterogeneous agents and dynamic tasks, the orchestrator synthesizes the context vector, agent histories, and predicted qualities to select the most suitable agent per task. Fuzzy evaluation modules generate soft supervision to train the selection network (Agrawal et al., 3 May 2025).
Inference-Time Search: MASS formalizes the exploration of message-passing trajectories during MAS inference (e.g., via beam search or MCTS), extending beyond single-pass execution. A process reward model guides the search by scoring partial inter-agent transcripts, facilitating the pruning of unpromising branches and compute-efficient reasoning (Yazdani et al., 28 Oct 2025).
Reinforcement Learning over Multi-Agent Workflows: MASS can be cast as a Markov Decision Process where each agent is a policy, and the global reward is typically propagated backward from task success metrics; recent advances include critic-free policy optimization using group-relative advantage estimation (Chen et al., 3 Jun 2025).
Grammar or Template-Based Topology Search: The MAS search space is encoded as a formal grammar, with production rules generating valid MAS compositions. Search is executed over derivations in this grammar, optimizing fitness measures like solution accuracy (Singh et al., 16 Dec 2025).

MASS is thus a meta-algorithmic construct that abstracts the generation, orchestration, and optimization of MAS in both design-time and inference-time settings.

2. Representative Methodologies

Recent instantiations of MASS include:

Stages of Prompt and Topology Optimization: The three-stage framework in "Multi-Agent Design: Optimizing Agents with Better Prompts and Topologies" comprises (i) block-level prompt tuning, (ii) topology search conditioned on prompt influence, and (iii) workflow-level prompt tuning for joint end-to-end adaptation (Zhou et al., 4 Feb 2025).
Neural Orchestration with Soft Supervision: MetaOrch employs a modular architecture, where each agent's context, history, and predicted quality are processed by an MLP to generate selection logits and a confidence head. A fuzzy evaluation module scores agent outputs along completeness, relevance, and confidence, producing soft labels for cross-entropy training (Agrawal et al., 3 May 2025).
Process Reward Model (MASPRM) for Guided Inference Search: MASPRM assigns per-agent, per-step values to partial trajectories by learning from MCTS rollouts and backpropagating reward signals. At inference, these values are integrated into beam search or MCTS to preferentially expand promising sub-dialogues (Yazdani et al., 28 Oct 2025).
Reinforcement Learning without Critic: Heterogeneous group-based policy optimization dispenses with the critic by normalizing returns within homogeneous or heterogeneous rollout groups, promoting stability and computational efficiency when optimizing LLM-based MAS (Chen et al., 3 Jun 2025).
Grammar Search with Forced Component Coverage: Grammar-based approaches represent the MAS design space via a context-free grammar and utilize forced-sampling strategies to maximize component diversity. Candidate MAS are constructed by sampling derivations and validated by empirical task accuracy (Singh et al., 16 Dec 2025).
MapReduce-Inspired Execution for Wide Search: A-MapReduce reimagines wide search as schema-aligned, horizontal decomposition of tasks, enabling parallel execution of atomic lookup operations and deterministic aggregation to achieve both improved coverage and significant efficiency gains (Chen et al., 1 Feb 2026).

3. Supervision, Evaluation, and Feedback

MASS frameworks incorporate diverse supervision and evaluation modules:

Fuzzy Evaluation and Soft Supervision: Aggregating completeness, relevance, and confidence membership values into soft quality scores, which are then used as supervision labels in orchestrator training (Agrawal et al., 3 May 2025).
Reward Models for Partial Trajectories: Outcome-based reward models and process reward heads are trained to approximate the value of partial inter-agent states, supporting compute-aware search and credit assignment in inference-time reasoning (Yazdani et al., 28 Oct 2025).
Interleaved Local-to-Global Conditioning: By conditioning later optimization stages on the outputs of prior prompt or topology searches, MASS frameworks enforce a pipeline that leverages localized improvements before system-level adaptations (Zhou et al., 4 Feb 2025).
Reinforcement/Policy Optimization with Groupwise Normalization: Instead of a critic, relative advantages are computed within sampled rollout groups, providing robust learning signals across heterogeneous agent specializations (Chen et al., 3 Jun 2025).
Cost-Efficiency and Robustness Metrics: Cost modeling explicitly accounts for API calls, wall-clock time, and code length, while controlled benchmarks such as MASBENCH evaluate gains or regressions along dimensions of depth, breadth, parallelism, and robustness (Ke et al., 21 Jan 2026, Singh et al., 16 Dec 2025).

4. Practical Applications and Domains

MASS methodologies have been deployed in:

Mathematical Reasoning, Question Answering, and Code Generation: Automated MAS searches have achieved up to 78.8% average accuracy across complex reasoning and QA tasks, surpassing manual system designs and prior automated meta-agent frameworks (Zhou et al., 4 Feb 2025, Singh et al., 16 Dec 2025).
Agentic Web Search: A-MapReduce attains state-of-the-art Item F1 (up to +17.5% over baselines) and nearly 2× runtime reduction for large-scale schema-aligned web retrieval (Chen et al., 1 Feb 2026).
Multi-Domain Orchestration: MetaOrch demonstrated 86.3% agent-selection accuracy in heterogeneous environments, with generalization to new agents and tasks supported without retraining (Agrawal et al., 3 May 2025).
Physical and Scientific System Design: Planner-driven MASS, as exemplified by S1-MatAgent, enables fully automated materials design workflows, from literature mining to in-silico optimization and experimental validation, delivering >400% improvements in candidate narrowing and a 27.7% gain in theoretical activity for catalyst design (Wang et al., 18 Sep 2025).
Motion Planning in Robotics: Hybrid MASS frameworks, such as stationary state planners for differential-drive robots, resolve kinodynamic multi-agent motion planning with empirical throughput improvements up to 400% (Yan et al., 2024).

5. Analysis of Empirical Results and Design Insights

Empirical findings from recent MASS research include:

Approach	Key Performance/Findings	Reference
MASS (prompt + topology search)	78.8% avg accuracy, consistently outperforms debate/AFlow/ADAS	(Zhou et al., 4 Feb 2025)
MetaOrch (neural orchestration)	86.3% selection accuracy, +61% over round-robin; 91.1% on validation	(Agrawal et al., 3 May 2025)
MASPRM-guided MCTS	+30.7 EM (GSM8K), +22.9 EM (MATH) over greedy baseline; 8.4 transfer	(Yazdani et al., 28 Oct 2025)
Grammar Search MAS	+0.8–3 accuracy points vs. code-level LLM search; 100% validity	(Singh et al., 16 Dec 2025)
A-MapReduce	5.1–17.5 pt Item F1 δ; 47% runtime cut vs. OpenAI o3 / Gemini-2.5	(Chen et al., 1 Feb 2026)
S1-MatAgent (materials design)	Reduced 20M→13 candidates; 27.7% performance gain; 97.5% 500h stability	(Wang et al., 18 Sep 2025)

Ablations and studies highlight that local prompt optimization yields larger gains than topology manipulation alone, and that soft/fuzzy supervision and relative reward normalization are critical for accurate learning and stable convergence. Removal of these components results in accuracy drops of up to 8% (Agrawal et al., 3 May 2025) or unstable oscillations (Chen et al., 3 Jun 2025). Controlled benchmarks reveal that the benefit of MAS design is contingent on the task's structural requirements: greatest gains are observed in tasks with parallel or robust, verification-demanding structure, while "depth-only" (sequential) tasks may favor SAS or incur unnecessary orchestration overhead (Ke et al., 21 Jan 2026).

6. Interpretability, Modularity, and Scalability

Modern MASS frameworks emphasize modularity and interpretability:

Component Modularization: Grammar-based approaches and orchestration frameworks restrict agent definitions to composable, transparent modules, allowing for systematic ablation, extension, or reordering. This yields MAS that are hundreds of lines shorter and measurably more interpretable than free-form code (Singh et al., 16 Dec 2025).
Plug-and-Play Agents: MetaOrch and S1-MatAgent allow new agent registration or toolset extension without retraining or core logic modification; adaptation to new domains is achieved via context/histories or planner reconfiguration (Agrawal et al., 3 May 2025, Wang et al., 18 Sep 2025).
Human-in-the-Loop and Confidence Signaling: Orchestrators expose confidence heads and soft labels, supporting human monitoring and overrides for critical applications (Agrawal et al., 3 May 2025).
Scaling: Computation is managed via batching (MapReduce), asynchronous rolling window replanning (robotics MASS), and distributed message passing (multi-scale graph planning), enabling practical operation across hundreds/thousands of agents or tasks (Chen et al., 1 Feb 2026, Yan et al., 2024, Lim et al., 2019).

7. Limitations and Open Research Directions

Ongoing challenges include:

Task-Dependence of MAS Benefit: Empirical results demonstrate that MAS outperforms SAS only where the task structure provides decomposability or verification needs that align with the MAS design (Ke et al., 21 Jan 2026). Otherwise, orchestration may introduce overhead.
Privacy, Security, and Communication: Distributed search frameworks emphasize privacy preservation and minimal information sharing, but quantification of privacy loss and cryptographically secure planning remain open problems (Nissim et al., 2013).
Efficient Design Space Exploration: The astronomical size of prompt/topology spaces necessitates advanced pruning, grammar encoding, or learning-based guidance; further reductions of search complexity and more intelligent sampling remain active topics (Singh et al., 16 Dec 2025).
Real-World Robustness: Scientific MASS frameworks face LLM hallucination, feasibility checking, and tool-integration issues (Wang et al., 18 Sep 2025). Run-time MASS is in early stages with little long-term validation (Diller et al., 2022).
Learning of Distributed Heuristics: Scalable distributed heuristics for exhaustedly explored state spaces, especially in tightly coupled environments, are yet to be fully addressed (Nissim et al., 2013).
Theory of Turn-Level vs. Global Credit Assignment: Theoretical analysis of reward signals in long-horizon, multi-agent credit assignment is ongoing (Chen et al., 8 Jan 2026).

This suggests that while MASS frameworks have generated substantial performance, efficiency, and interpretability advances, further research is required to address domain-specific design, robust credit assignment, and secure/open-ended system design in complex multi-agent environments.