PlanGEN: Adaptive Multi-Agent Planning
- PlanGEN is a framework that uses multi-agent orchestration for constraint extraction, plan verification, and adaptive algorithm selection to tackle complex planning scenarios.
- It integrates Best-of-N, Tree-of-Thought, and REBASE algorithms with a modified UCB strategy to dynamically choose the best inference method based on instance complexity.
- Empirical results and ablation studies demonstrate state-of-the-art performance with gains up to 13% over traditional LLM-based planning methods in diverse benchmarks.
PlanGEN is a multi-agent, model-agnostic planning and reasoning framework designed to address complex problem-solving scenarios that require instance-specific constraint management, robust verification, and adaptive, algorithm-level inference control. Developed to overcome common failure modes in existing LLM-based planning systems—namely, failure to systematically enforce constraints and inability to adapt inference strategies to varying instance complexity—PlanGEN achieves state-of-the-art performance across diverse planning and reasoning benchmarks by integrating three dedicated agent components: constraint extraction, plan verification, and adaptive algorithm selection (Parmar et al., 22 Feb 2025).
1. Motivation and Limitations of Prior Approaches
Prior LLM-based planning methods perform adequately on routine, template-driven tasks but consistently underperform when faced with richly constrained, heterogeneous “real-world” planning and reasoning domains, such as multi-party scheduling, multi-leg itinerary planning, or scientific deduction. Two core challenges motivate the PlanGEN framework:
- Verification: Existing approaches generally lack explicit, instance-level verification. This leads to generated plans that violate nontrivial task constraints—budgets, time-windows, approval conditions, domain-specific admissibility, etc.
- Varying Instance Complexity: A fixed-choice inference algorithm (e.g., simple Best-of-N sampling) is insufficiently flexible, failing to efficiently allocate computational resources across easy, moderate, and hard task instances.
PlanGEN was designed to overcome these deficiencies by introducing constraint-guided iterative plan verification and adaptive algorithm selection according to real-time complexity assessment (Parmar et al., 22 Feb 2025).
2. Formal Definition of Framework Components
PlanGEN decomposes the planning-and-reasoning process into three agentic modules, each instantiated as LLMs or similarly capable models with specific prompting schemes:
| Agent | Input | Output |
|---|---|---|
| Constraint Agent | Problem text | Set of instance-specific constraints |
| Verification Agent | Candidate plan , constraint set | Feedback string ; numerical reward evaluating plan validity versus |
| Selection Agent | Statistics of results for each algorithm | Algorithm choice for the next inference round |
The Constraint Agent queries the input instance for explicit constraints, which are supplied as a structured set to subsequent agents. The Verification Agent receives a candidate plan and constraint set, producing both natural-language feedback and a quantitative score reflecting the plan's degree of compliance. The Selection Agent orchestrates algorithm choice among Best-of-N (BoN), Tree-of-Thought (ToT), and REBASE inference methods by maintaining cumulatively updated statistics and applying a bandit-style Upper Confidence Bound (UCB) criterion.
3. Integrated Inference-Time Algorithms and Multi-Agent Orchestration
The PlanGEN framework encapsulates and extends three standard planning algorithms under its constraint–verification–selection regime:
- PlanGEN (Best-of-N, BoN): Generates plans in parallel from the base LLM, verifies each via the verification agent, and selects the plan with maximal reward .
- PlanGEN (Tree-of-Thought, ToT): Constructs a tree of partial “thought” nodes, scoring feasible expansions using the verification agent. The method selects top-performing partial trajectories at each depth, returning the first complete plan above a defined reward threshold.
- PlanGEN (REBASE): Employs a reward-balanced tree search paradigm, pruning branches whose expansion yields low and preserving top-performing partial solutions. The search returns upon finding any complete plan with .
- PlanGEN (Mixture): The selection agent adaptively determines which of BoN, ToT, or REBASE to invoke per problem instance, based on dynamic instance complexity and historical performance.
Pseudo-code for each sub-algorithm formalizes the agent interactions, with explicit invocation of the verification module for all sampled or expanded plans (Parmar et al., 22 Feb 2025).
4. Algorithm Selection via Modified Upper Confidence Bound (UCB) Strategy
The selection agent's choice of inference method is governed by a multi-factor UCB formula:
where:
- is cumulative reward for algorithm
- is the usage count
- is total trials
- is an LLM-derived prior suitability estimate for
- decays with trial count
- are hyperparameters for diversity and recovery bonuses
- is the recovery score after failures
This mechanism balances exploitation of historically successful inference methods and exploration of alternative algorithms, while integrating model-based priors and heuristic bonuses for diversity and recovery (Parmar et al., 22 Feb 2025).
5. Empirical Performance and Ablations
PlanGEN was evaluated on four challenging planning and reasoning benchmarks using Gemini-1.5-Pro and compared against strong baselines, including zero-shot Chain-of-Thought (CoT) and vanilla multi-agent self-refinement. Key results are summarized below:
| Benchmark | Baseline | PlanGEN | Relative Gain |
|---|---|---|---|
| NATURAL PLAN (avg EM) | 52.0 | 60.0 | +8% |
| OlympiadBench (MATH) | 50.7 | 55.9 | +5% |
| OlympiadBench (PHY) | 28.3 | 31.8 | +4% |
| DocFinQA (Acc/F1) | 24.0/22.5 | 31.1/29.4 | +7% |
| GPQA (Acc) | 46.2 | 59.6 | +13% (Mixture), +1% (vs. vanilla multi-agent) |
Ablation studies show a 3–5% decrease in accuracy if constraint-guided verification is removed and a ~4% drop in mixture performance when UCB selection is replaced by round-robin scheduling. In the NATURAL PLAN–Calendar subset, PlanGEN-ToT performs best on easy instances, BoN on medium, and the Mixture approach on hard cases.
6. Key Findings and Implications
Empirical and ablation studies support several overarching conclusions:
- Constraint-guided iterative verification systematically enforces complex, instance-level requirements—transforming base inference algorithms into robust, high-fidelity planners.
- Adaptive, per-instance algorithm selection via the selection agent yields statistically significant gains by allocating compute aligned to problem hardness: BoN for moderate, ToT for simple, and REBASE for deeply entangled cases.
- Model-agnostic design demonstrates robustness across multiple LLM backbones (Gemini-1.5, Gemini-2.0, GPT-4o).
These findings establish that multi-agent orchestration—grounded in explicit constraint extraction, continuous plan verification, and reward-balanced adaptive selection—enables SOTA performance on a broad suite of complex, high-precision planning and reasoning tasks (Parmar et al., 22 Feb 2025).
7. Significance in Context
PlanGEN's modular, agent-driven blueprint offers a general paradigm for reasoning-intensive AI tasks that demand both local verification and global adaptability. Its decoupling of constraint extraction, plan evaluation, and instance-adaptive inference provides a robust foundation for further research in scalable, trustworthy autonomous planning systems. This framework marks a transition point toward deployable LLM-based solutions that operate successfully in open-world, constraint-rich environments.