Modular Agentic Planner (MAP)

Updated 30 January 2026

MAP is a formal architecture that decomposes multi-step reasoning and tool use into specialized, interacting modules, ensuring scalability and auditability.
It features a modular design with distinct roles—Planner, Executor, Verifier, and Generator—that coordinate via shared memory and explicit inter-module protocols.
MAP is applied in tool-augmented QA, robotics, manufacturing, and finance, demonstrating enhanced accuracy and efficiency compared to monolithic models.

A Modular Agentic Planner (MAP) is a formal architecture that decomposes multi-step reasoning and tool use into specialized, interacting modules—typically a Planner, Executor, Verifier or Monitor, and Generator or Summarizer—with coordination ensured through shared memory and explicit inter-module protocols. MAP architectures are prominent in advanced agentic systems for search planning, tool-augmented reasoning, workflow optimization, robotic skill orchestration, manufacturing, time-series modeling, and adaptive evaluation tasks, offering improved scalability, controllability, auditability, and cross-domain transfer relative to monolithic sequence models (Li et al., 7 Oct 2025, Webb et al., 2023, Mei et al., 28 Aug 2025, Farahani et al., 23 Nov 2025, Syarubany et al., 2 Jan 2026, Ang et al., 19 Aug 2025, Zhu et al., 30 Sep 2025). This article surveys the formal design principles, typical module roles and interfaces, representative learning algorithms, domain variants, empirical properties, and integration strategies of MAP systems.

1. Formal Foundations and Module Decomposition

MAP formalizes the planning process as an agentic loop composed of discrete functional components that separately handle high-level goal decomposition, action proposal, execution, feedback evaluation, and termination. The canonical loop instantiates at least four modules:

Planner ( $\pi_\theta$ ): Consumes the evolving state $s^t$ (typically, query, toolset, accumulated memory) and emits an action $a^t$ , which comprises a sub-goal specification, tool choice, and invocation context (Li et al., 7 Oct 2025).
Executor ( $\mathcal{E}$ ): Receives $(a^t, k)$ , executes the tool $k \in K$ , and yields raw execution results $e^t$ .
Verifier ( $\mathcal{V}$ ) / Monitor: Reviews $(q, e^t, M^t)$ , outputs a binary signal $v^t$ denoting task sufficiency or rule compliance (Webb et al., 2023).
Generator ( $s^t$ 0) / Summarizer: On termination, fuses the final memory state $s^t$ 1 and input $s^t$ 2 to produce the final output $s^t$ 3 or score and explanation (Li et al., 7 Oct 2025, Zhu et al., 30 Sep 2025).

MAP architectures enforce modularity by constraining each module’s input/output contract, prompting logic, and memory read/write privileges. For instance, in multi-turn tool-augmented QA, the Planner selects a single tool per turn; the Executor executes strictly one call; the Verifier deterministically decides on termination; state and decision records are appended to memory; and only after termination does the Generator synthesize the final solution (Li et al., 7 Oct 2025).

Variants may add extra modules such as Task Decomposer, Actor, State Predictor, Monitor, and Orchestrator for fine-grained control (e.g., decomposing a plan into sub-goals, filtering invalid moves, simulating state transitions, or managing search recursion) (Webb et al., 2023, Zhu et al., 30 Sep 2025). Specialized MAP versions serve domains like manufacturing (layered pipeline overseen by an LLM Planner Agent) (Farahani et al., 23 Nov 2025), robotics (LLM-based decision node linked to motion primitives) (Syarubany et al., 2 Jan 2026), and finance (planner managing multi-stage time-series modeling workflows with audit logs) (Ang et al., 19 Aug 2025).

2. Learning Algorithms and Optimization Protocols

Trainable MAP architectures address long-horizon sparse-reward credit assignment via trajectory-level reinforcement learning, typically with adaptations for multi-agent or modular structure.

Flow-GRPO (Flow-based Group Refined Policy Optimization): Broadcasts a single trajectory-level reward $s^t$ 4 to all turns and normalizes per-turn advantages by group variance, stabilizing on-policy updates. The surrogate objective is a PPO-style clipped function with KL regularization, yielding gradients:

$s^t$ 5

(Li et al., 7 Oct 2025).

Pareto-Optimal Multi-Objective RL: For agentic search, MAP employs dual scalar rewards for outcome utility and planning cost. Policies are optimized to trace the Pareto frontier, balancing accuracy ( $s^t$ 6) and efficiency ( $s^t$ 7), under a scalarized reward $s^t$ 8, updated via PPO (Mei et al., 28 Aug 2025).
POMDP-Based Coordination: In manufacturing and analytic workflows, the Planner Agent solves a structured POMDP across pipeline layers; rewards are assigned for step success/failure, with context maintained in sliding window memory. Tool calls, fallback logic, and human-in-the-loop event logging ensure end-to-end transparency (Farahani et al., 23 Nov 2025).

Supervised fine-tuning is employed in domains such as image quality assessment, with cross-entropy minimization over next-token prediction of structured planner outputs (Zhu et al., 30 Sep 2025).

3. Domain-Specific Instantiations and Architectural Variants

MAP has been adapted for a spectrum of real-world and synthetic tasks, each with domain-specific agent specialization:

Domain	Planner Role / Integration	Key Modules / Features
Tool-augmented QA, Math	Multi-turn sub-goal + tool selection	Planner, Executor, Verifier, Generator, Memory (Li et al., 7 Oct 2025)
Robotic Navigation	Discrete action at junctions, skill gating	DecisionModule(LMM), FSM Controller, Motion primitives (Syarubany et al., 2 Jan 2026)
Smart Manufacturing	Orchestration of analytics pipeline	LLM Planner Agent, Schema/Feature/Model/Optimization agents (Farahani et al., 23 Nov 2025)
Financial Time-Series	Multi-stage code/model/hyperparameter refinement	Planner agent, Knowledge banks, feedback loop (Ang et al., 19 Aug 2025)
Image Quality Assessment	Structured plan for detection, analysis, scoring	Planner, Executor, Summarizer, VLM backbone, JSON plans (Zhu et al., 30 Sep 2025)

In advanced variants, modular registration and dynamic tool discovery support extensibility; planner outputs and module logs are serialized to audit trails (JSON or SQL), ensuring reproducibility and interpretability (Farahani et al., 23 Nov 2025, Ang et al., 19 Aug 2025).

4. Empirical Properties and Benchmark Results

MAP systems consistently yield accuracy, tool selection reliability, and auditability improvements over monolithic or direct end-to-end LLM baselines. Notable findings include:

AgentFlow (7B backbone, Flow-GRPO): +14.9% accuracy on search QA, +14.0% on agentic tasks, +14.5% on mathematical reasoning, +4.1% on scientific tasks versus top-performing baselines; performance scales positively with model size and reasoning turns; surpasses GPT-4o despite much smaller parameter count (Li et al., 7 Oct 2025).
Graph Traversal, Planning: MAP achieves near-optimal solutions and eliminates invalid moves; modules such as Monitor are critical for hallucination avoidance (Webb et al., 2023).
Financial Modeling (TS-Agent): MAP-based planning yields superior metrics (RMSE, MAE, Success rate) across forecasting and synthetic data generation, with full audit trails for all decisions (Ang et al., 19 Aug 2025).
Image Quality Assessment: Planner-level accuracy of 76.8% (AgenticIQA, Qwen2.5-VL) on MCQ benchmark (AgenticIQA-Eval), outperforming open-source VLM baselines; improved SRCC on standard IQA datasets (Zhu et al., 30 Sep 2025).
Search QA with Pareto Optimization: MAP achieves +10.8% accuracy over strong non-modular baselines, with explicit cost control and cross-generator/domain generalization (Mei et al., 28 Aug 2025).

Training dynamics indicate that modular decomposition leads to stable learning, concise actions, and efficient sample usage (Li et al., 7 Oct 2025).

5. Integration, Runtime Coordination, and Extensibility

MAP architectures demand extensive runtime protocols to enable robust multi-agent coordination. Practical choices include:

Shared Memory / Context: Deterministic or sliding-window memory stores all decisions, tool calls, and module outputs for live context and post-hoc auditing (Li et al., 7 Oct 2025, Farahani et al., 23 Nov 2025).
Inter-Agent Communication: Structured JSON or RESTful API protocols for invoking, returning, and registering agents/tools, supporting rapid module extension and replacement (Farahani et al., 23 Nov 2025).
Finite-State Controllers and Skill Gating: In robotics and control, planners interface with low-level FSM controllers that latch gates for motion primitives, triggered by discrete planner actions and events (Syarubany et al., 2 Jan 2026).
Human-in-the-Loop: Audit logs and explicit interfaces allow for transparent validation, modification, or rejection of planner recommendations, crucial for compliance-critical domains (Farahani et al., 23 Nov 2025, Ang et al., 19 Aug 2025).
Logging and Visualization: Semantic maps and decision history logs are visualized (e.g., RViz in ROS, Prometheus/Grafana for metrics), facilitating debugging and traceability (Syarubany et al., 2 Jan 2026, Farahani et al., 23 Nov 2025).

MAP’s separation of reasoning, tool selection, execution, verification, and summarization provides clean separation between high-level LLM-driven logic and low-level, fast tactical steps (often handled by specialized SLMs, microservices, or containers for scalability).

6. Comparisons, Transfer, and Limitations

MAP architectures outperform classical approaches such as monolithic RL models, zero-shot prompting, chain-of-thought, multi-agent debate, and tree-of-thought on both accuracy and invalid action reduction (Webb et al., 2023, Li et al., 7 Oct 2025, Ang et al., 19 Aug 2025). Modular decomposition is especially crucial for settings with long reasoning horizons, sparse rewards, diverse toolsets, or heterogeneous skill modules. The separation of planning and execution modules enables fine-tuned planners to generalize across frozen generators, domains, and environments without retraining downstream models (Mei et al., 28 Aug 2025, Farahani et al., 23 Nov 2025).

A plausible implication is that computational cost and latency may increase with full modularization, as exemplified by the hundreds of LLM calls required on certain planning benchmarks (Webb et al., 2023). However, this is offset by robustness, scalability, and the capacity for structured audit and dynamic module swapping.

MAP’s integration of explicit credit assignment (broadcasted trajectory-level reward), group-normalized advantage, and modular update loops addresses long-standing limitations in deep RL for multi-turn reasoning, tool-augmented workflows, and cross-domain agentic system design (Li et al., 7 Oct 2025, Mei et al., 28 Aug 2025).