Papers
Topics
Authors
Recent
Search
2000 character limit reached

PlanGEN: Adaptive Multi-Agent Planning

Updated 29 December 2025
  • PlanGEN is a framework that uses multi-agent orchestration for constraint extraction, plan verification, and adaptive algorithm selection to tackle complex planning scenarios.
  • It integrates Best-of-N, Tree-of-Thought, and REBASE algorithms with a modified UCB strategy to dynamically choose the best inference method based on instance complexity.
  • Empirical results and ablation studies demonstrate state-of-the-art performance with gains up to 13% over traditional LLM-based planning methods in diverse benchmarks.

PlanGEN is a multi-agent, model-agnostic planning and reasoning framework designed to address complex problem-solving scenarios that require instance-specific constraint management, robust verification, and adaptive, algorithm-level inference control. Developed to overcome common failure modes in existing LLM-based planning systems—namely, failure to systematically enforce constraints and inability to adapt inference strategies to varying instance complexity—PlanGEN achieves state-of-the-art performance across diverse planning and reasoning benchmarks by integrating three dedicated agent components: constraint extraction, plan verification, and adaptive algorithm selection (Parmar et al., 22 Feb 2025).

1. Motivation and Limitations of Prior Approaches

Prior LLM-based planning methods perform adequately on routine, template-driven tasks but consistently underperform when faced with richly constrained, heterogeneous “real-world” planning and reasoning domains, such as multi-party scheduling, multi-leg itinerary planning, or scientific deduction. Two core challenges motivate the PlanGEN framework:

  • Verification: Existing approaches generally lack explicit, instance-level verification. This leads to generated plans that violate nontrivial task constraints—budgets, time-windows, approval conditions, domain-specific admissibility, etc.
  • Varying Instance Complexity: A fixed-choice inference algorithm (e.g., simple Best-of-N sampling) is insufficiently flexible, failing to efficiently allocate computational resources across easy, moderate, and hard task instances.

PlanGEN was designed to overcome these deficiencies by introducing constraint-guided iterative plan verification and adaptive algorithm selection according to real-time complexity assessment (Parmar et al., 22 Feb 2025).

2. Formal Definition of Framework Components

PlanGEN decomposes the planning-and-reasoning process into three agentic modules, each instantiated as LLMs or similarly capable models with specific prompting schemes:

Agent Input Output
Constraint Agent Problem text xx Set of instance-specific constraints C={c1,,cm}\mathcal{C} = \{c_1,\dots,c_m\}
Verification Agent Candidate plan π\pi, constraint set C\mathcal{C} Feedback string f(π)f(\pi); numerical reward R(π)[100,100]R(\pi)\in[-100,100] evaluating plan validity versus C\mathcal{C}
Selection Agent Statistics of results for each algorithm aa Algorithm choice a=argmaxaUCB(a)a^* = \arg\max_a \mathrm{UCB}(a) for the next inference round

The Constraint Agent queries the input instance for explicit constraints, which are supplied as a structured set to subsequent agents. The Verification Agent receives a candidate plan and constraint set, producing both natural-language feedback and a quantitative score reflecting the plan's degree of compliance. The Selection Agent orchestrates algorithm choice among Best-of-N (BoN), Tree-of-Thought (ToT), and REBASE inference methods by maintaining cumulatively updated statistics and applying a bandit-style Upper Confidence Bound (UCB) criterion.

3. Integrated Inference-Time Algorithms and Multi-Agent Orchestration

The PlanGEN framework encapsulates and extends three standard planning algorithms under its constraint–verification–selection regime:

  • PlanGEN (Best-of-N, BoN): Generates NN plans in parallel from the base LLM, verifies each via the verification agent, and selects the plan with maximal reward R(π)R(\pi).
  • PlanGEN (Tree-of-Thought, ToT): Constructs a tree of partial “thought” nodes, scoring feasible expansions using the verification agent. The method selects top-performing partial trajectories at each depth, returning the first complete plan above a defined reward threshold.
  • PlanGEN (REBASE): Employs a reward-balanced tree search paradigm, pruning branches whose expansion yields low R(π)R(\pi) and preserving top-performing partial solutions. The search returns upon finding any complete plan with R(π)ThR(\pi) \geq T_h.
  • PlanGEN (Mixture): The selection agent adaptively determines which of BoN, ToT, or REBASE to invoke per problem instance, based on dynamic instance complexity and historical performance.

Pseudo-code for each sub-algorithm formalizes the agent interactions, with explicit invocation of the verification module for all sampled or expanded plans (Parmar et al., 22 Feb 2025).

4. Algorithm Selection via Modified Upper Confidence Bound (UCB) Strategy

The selection agent's choice of inference method is governed by a multi-factor UCB formula:

UCB(a)=R(a)N(a)+2ln(T+1)N(a)+λpriorPrior(a)+αdivN(a)+1+αrecSrec(a)\mathrm{UCB}(a) = \frac{R(a)}{N(a)} + \sqrt{\frac{2\ln(T+1)}{N(a)}} + \lambda_{\mathrm{prior}}\mathrm{Prior}(a) + \frac{\alpha_{\mathrm{div}}}{N(a)+1} + \alpha_{\mathrm{rec}}S_{\mathrm{rec}}(a)

where:

  • R(a)R(a) is cumulative reward for algorithm aa
  • N(a)N(a) is the usage count
  • TT is total trials
  • Prior(a)\mathrm{Prior}(a) is an LLM-derived prior suitability estimate for aa
  • λprior\lambda_{\mathrm{prior}} decays with trial count
  • αdiv,αrec\alpha_{\mathrm{div}}, \alpha_{\mathrm{rec}} are hyperparameters for diversity and recovery bonuses
  • Srec(a)S_{\mathrm{rec}}(a) is the recovery score after failures

This mechanism balances exploitation of historically successful inference methods and exploration of alternative algorithms, while integrating model-based priors and heuristic bonuses for diversity and recovery (Parmar et al., 22 Feb 2025).

5. Empirical Performance and Ablations

PlanGEN was evaluated on four challenging planning and reasoning benchmarks using Gemini-1.5-Pro and compared against strong baselines, including zero-shot Chain-of-Thought (CoT) and vanilla multi-agent self-refinement. Key results are summarized below:

Benchmark Baseline PlanGEN Relative Gain
NATURAL PLAN (avg EM) 52.0 60.0 +8%
OlympiadBench (MATH) 50.7 55.9 +5%
OlympiadBench (PHY) 28.3 31.8 +4%
DocFinQA (Acc/F1) 24.0/22.5 31.1/29.4 +7%
GPQA (Acc) 46.2 59.6 +13% (Mixture), +1% (vs. vanilla multi-agent)

Ablation studies show a 3–5% decrease in accuracy if constraint-guided verification is removed and a ~4% drop in mixture performance when UCB selection is replaced by round-robin scheduling. In the NATURAL PLAN–Calendar subset, PlanGEN-ToT performs best on easy instances, BoN on medium, and the Mixture approach on hard cases.

6. Key Findings and Implications

Empirical and ablation studies support several overarching conclusions:

  • Constraint-guided iterative verification systematically enforces complex, instance-level requirements—transforming base inference algorithms into robust, high-fidelity planners.
  • Adaptive, per-instance algorithm selection via the selection agent yields statistically significant gains by allocating compute aligned to problem hardness: BoN for moderate, ToT for simple, and REBASE for deeply entangled cases.
  • Model-agnostic design demonstrates robustness across multiple LLM backbones (Gemini-1.5, Gemini-2.0, GPT-4o).

These findings establish that multi-agent orchestration—grounded in explicit constraint extraction, continuous plan verification, and reward-balanced adaptive selection—enables SOTA performance on a broad suite of complex, high-precision planning and reasoning tasks (Parmar et al., 22 Feb 2025).

7. Significance in Context

PlanGEN's modular, agent-driven blueprint offers a general paradigm for reasoning-intensive AI tasks that demand both local verification and global adaptability. Its decoupling of constraint extraction, plan evaluation, and instance-adaptive inference provides a robust foundation for further research in scalable, trustworthy autonomous planning systems. This framework marks a transition point toward deployable LLM-based solutions that operate successfully in open-world, constraint-rich environments.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to PlanGEN Framework.