Modularized LLM Agent Search (MoLAS)

Updated 12 February 2026

The paper introduces MoLAS, a framework that modularizes LLM agent functions into discrete components to enable systematic search and evaluation.
It leverages module evolution and recombination with an LLM-based performance predictor, reducing rollout costs by up to 400× with high accuracy.
Empirical benchmarks show MoLAS-based agents outperform traditional architectures by an average improvement of 17.2% across diverse tasks.

Modularized LLM Agent Search (MoLAS) is a research paradigm and practical framework for designing LLM agent systems via systematic decomposition into discrete, swappable modules. MoLAS emerges from the recognition that a wide range of effective LLM agent architectures—across search, reasoning, and tool-use tasks—can be understood as compositions of abstract modules such as Planning, Reasoning, Tool Use, and Memory, each with a standardized interface. This modularization enables the efficient search, evaluation, and automatic configuration of agent architectures to optimize performance for diverse applications, from web and legal search to molecular design, while preserving interpretability and extensibility (Shang et al., 2024).

1. Formal Definition and Modular Design Space

MoLAS is defined as the discrete agent design problem: $\underset{P\in \mathcal P,\;R\in \mathcal R,\;T\in \mathcal T,\;M\in \mathcal M}{\arg\max}\;\mathrm{Eval}_d\bigl(P,R,T,M\bigr)$ where $\mathcal{P}$ , $\mathcal{R}$ , $\mathcal{T}$ , and $\mathcal{M}$ denote the sets of Planning, Reasoning, Tool-Use, and Memory modules, respectively, and $Eval_d$ is a real-world task performance metric for task $d$ (Shang et al., 2024).

The modular design space is synthesized from an analysis of diverse agent systems (e.g., Chain-of-Thought, Tree-of-Thought, HuggingGPT, Generative Agents, Voyager), which abstract to combinations of these fundamental modules, each communicating via uniform text-based I/O. This abstraction supports:

Plug-and-play swapping of modules for rapid adaptation to novel domains or goals.
Algorithmic search (evolution, recombination) over a combinatorially large space of architectures (Shang et al., 2024).

2. Search and Optimization Algorithms for MoLAS

Effective MoLAS instantiations require search algorithms that can efficiently discover, evaluate, and optimize agent compositions. The AgentSquare framework introduces two core mechanisms:

Module Evolution: Generation of new module variants (e.g., planners, reasoners) via LLM-driven code mutation. Each variant is empirically evaluated; the top performer seeds the next phase (Shang et al., 2024).
Module Recombination: Selection and recombination of existing modules (from $\mathcal{P},\mathcal{R},\mathcal{T},\mathcal{M}$ ) under LLM guidance, rapidly exploring promising configurations without code regeneration.

A distinguishable feature is the incorporation of an in-context performance predictor (an LLM surrogate), which, conditioned on past agent-module performance data, can predict the likely performance of a candidate configuration. This surrogate reduces the need for expensive real-world rollouts by up to 400× and achieves prediction correlation $\rho\approx 0.9$ with empirical scores (Shang et al., 2024).

The typical search loop alternates:

$N$ -way evolution → real evaluation → best seed for recombination,
$N$ -way recombination → predictor ranking → single real evaluation, and iterates until convergence or resource exhaustion (Shang et al., 2024).

3. Canonical Module Taxonomy and Interfaces

MoLAS modules are functionally defined as follows (Shang et al., 2024):

Planning: $P(d, f)\rightarrow \{s_1,\dots,s_n\}$ , decomposes the top-level task description $d$ (and optional feedback $f$ ) into discrete subtasks.
Reasoning: $R(s_i, f_i)\rightarrow r_i$ , processes a subtask $s_i$ (with feedback $f_i$ ), outputs a natural-language solution or chain-of-thought.
Tool Use: $T(p_{ij},\tau)\rightarrow t_{ij}$ , invokes a relevant external tool $\tau$ as determined by a tool request $p_{ij}$ parsed from the reasoning trace.
Memory: Implements $write$ $(o,mem)\rightarrow mem'$ , and $retrieve$ $(o,mem)\rightarrow m$ primitives, supporting episodic, semantic, or long-range retrieval and storage.

The communication protocol is strictly text-based (commonly via XML or JSON tags/messages), allowing modules to be orchestrated sequentially, recursively, or in agential graphs. This design fully supports module recombination and independent evolution (Shang et al., 2024, Chauhan, 12 Nov 2025, Kim et al., 27 May 2025).

4. Empirical Benchmarks and Performance

MoLAS-based agent search algorithms (exemplified by AgentSquare) have been evaluated across six representative benchmarks, spanning web, embodied, tool-use, and game domains:

WebShop (online shopping): Measures task score.
ALFWorld, ScienceWorld (embodied navigation and manipulation): Measures success/progress.
M3Tool, TravelPlanner (multi-tool and itinerary synthesis): Success or micro-pass rate.
PDDL (game planning): Progress rate (Shang et al., 2024).

Results indicate that modular agent search exceeds the performance of the strongest existing human-designed agent architectures. For example, AgentSquare achieves: $\begin{array}{l|cccccc} \text{Method} & \text{Webshop} & \text{ALFWorld} & \text{SciWorld} & \text{M3Tool} & \text{TravelPlanner} & \text{PDDL} \ \hline \text{Best Human} & 0.551 & 0.551 & 0.740 & 0.502 & 0.540 & 0.616 \ \text{AgentSquare} & \mathbf{0.607} & \mathbf{0.695} & \mathbf{0.781} & \mathbf{0.524} & \mathbf{0.583} & \mathbf{0.669} \ \text{Relative Gain} & +10.2\% & +26.1\% & +5.4\% & +4.2\% & +8.0\% & +8.5\% \ \end{array}$ with an average +17.2% improvement (Shang et al., 2024). Ablation studies confirm both evolution and recombination are essential for optimal performance.

5. Instantiations in Domain-Specific Pipelines

MoLAS has been specialized and extended in several domain-specific agentic systems:

QAgent introduces a modular RL-trained search agent for retrieval-augmented generation (RAG), decomposing agentic steps (planning, reflection, retrieval, answer generation) with plug-and-play modules and XML-based state-passing (Jiang et al., 9 Oct 2025).
M-ASK decouples agentic search into Search Behavior Agents (planning, search, answer) and Knowledge Management Agents (summarization, update) operating on a shared context state, using PPO with turn-specific reward attribution to stabilize and optimize multi-agent coordination (Chen et al., 8 Jan 2026).
L-MARS implements multi-agent, orchestrated reasoning in legal question answering by decomposing queries and leveraging source-specialized search agents, a Judge Agent for verification, and an overview agent for grounded answer construction (Wang et al., 31 Aug 2025).
MT-Mol leverages modular agents for each analytical domain, proposal, verification, and review, using structured JSON protocols for communication and tool-based reasoning to iteratively optimize molecular structures (Kim et al., 27 May 2025).
AI Founding Fathers demonstrates the use of recursive refinement and controlled incremental search (GIS) in multi-agent pipelines, leveraging modular validators, adversarial “red team” agents, and structured persona-driven reasoning for historical argumentation (Chauhan, 12 Nov 2025).

These instantiations validate the general principle that specialized modular agents, orchestrated through explicit interface logic, outperform monolithic or end-to-end LLM workflows in retrieval, reasoning complexity, generalization, and interpretability.

6. Interpretability, Extensibility, and Best Practices

Interpretability is enhanced in MoLAS by:

Explicit module boundaries and tagged communication (XML/JSON blocks).
Structured reasoning and feedback protocols (stepwise, tool-aligned rationales, consistency checks).
Direct mapping from module output to explainable engineering artifacts (e.g., reasoning chains, verification summaries) (Kim et al., 27 May 2025, Jiang et al., 9 Oct 2025, Chauhan, 12 Nov 2025).

Modular agent design also enables:

Rapid plug-in of third-party modules (new retrievers, reasoning engines, evaluation head).
Parallel and hierarchical orchestration (multiple agents at different abstraction layers).
Dynamic agent composition, adaptive agent spawning, and global memory support (Chen et al., 8 Jan 2026).
Continuous integration of new evaluators and arbiters for robust training and testing (Chauhan, 12 Nov 2025).

Empirical studies highlight that decoupling distinct agent roles, stabilizing reward attribution, and ensuring uniform interfaces are critical for achieving both high performance and transparent, maintainable systems (Shang et al., 2024, Chen et al., 8 Jan 2026).

7. Limitations, Open Challenges, and Future Directions

MoLAS frameworks incur a substantial increase in agent call volume and system orchestration overhead relative to monolithic baselines. Latency and LLM cost can become bottlenecks, especially for recursive or multi-hop compositions (Chen et al., 8 Jan 2026).

Potential research directions include:

Asynchronous and parallelized execution frameworks.
Role-specific fine-tuning and meta-controller strategies for dynamic agent composition.
Incorporation of multi-modal inputs (vision, audio) and global memory sharing.
Exploration of richer inter-agent protocol layers (message-passing, value-decomposition).
Expansion of the modular taxonomy to encompass critique, explanation, and verification agents beyond canonical planning/reasoning/tool/memory archetypes (Chen et al., 8 Jan 2026, Shang et al., 2024).

A plausible implication is that the MoLAS paradigm, by placing module composition and explicit search at the core of agent design, will enable scaling LLM agents to more complex, open-world, and multi-domain tasks, with observable benefits in generalization, interpretability, and empirical performance.