Agentic Scene Planning Framework
- Agentic scene planning is a paradigm that integrates closed-loop reasoning and modular operations (planner, executor, critic) to iteratively refine complex scene representations.
- It employs formal scenario structures—using maps, actor trajectories, and time horizons in JSON-like formats—to ensure physical plausibility and strict intent alignment.
- The approach leverages prompt engineering and multi-turn feedback for scalable, controlled scenario synthesis, enhancing safety and efficacy in autonomous systems.
Agentic scene planning is a paradigm that introduces closed-loop reasoning, tool use, and natural language intent interpretation into the synthesis, control, and evaluation of complex scenes, with applications ranging from 3D environment creation and traffic scenario generation to robotics and autonomous driving. The agentic approach replaces monolithic or purely data-driven pipelines with multi-module agents—typically orchestrated by LLMs or vision-LLMs (VLMs)—that iterate between planning, acting, auditing, and refining the evolving scene representation. This design enables fine-grained intent alignment, ensures physical and semantic plausibility, and increases scalability for generating diverse, high-challenge, and controllably modifiable scene configurations (Yao et al., 18 Jul 2025).
1. Formal Scenario Representations in Agentic Planning
Agentic scene planning begins by defining structured scene representations compatible with symbolic and machine learning reasoning. In the context of traffic scenario augmentation, a scene is represented as , where describes the static map (lanes, crosswalks, traffic lights), encodes actor trajectories , and is the time horizon. These representations are encapsulated in JSON-like data structures containing the map, actor states indexed over time, and relevant traffic rules (Yao et al., 18 Jul 2025).
The agentic scene planning operator is a parameterized function , where is a base scene, is a high-level intent (expressed in natural language), and is the generated scene expected to obey and a maximized intent score .
2. Agentic Architectures: Planner–Executor–Critic Loops
A typical agentic framework features a modular, looped architecture comprising:
- Planner: Decomposes the intent into a sequence of symbolic, high-level editing operations on the current scene (e.g., “advance pedestrian by 1 m”, “reduce lead car’s speed”).
- Executor: Realizes each editing step, producing a candidate scene in the prescribed data schema.
- Critic: Audits the candidate for both realism (using learned discriminators or rule-based checks) and intent alignment (by embedding-based similarity or regression). Accepts or rejects the modification and provides feedback for refinement.
This process iterates, typically for steps, until the critic’s composite score exceeds a threshold , at which point the final scene (now ) is returned. Prompt engineering—phase- and role-separated, templated exchanges—ensures fine-grained control and deterministic, interpretable operation (Yao et al., 18 Jul 2025).
High-level pseudocode:
1 2 3 4 5 6 7 8 9 10 11 |
θ(S) ← θ(S₀) for k in 1…K: P ← Planner(θ(S), d) θ̃ ← Executor(θ(S), P) score ← Critic(θ̃, d) if score ≥ τ: θ(S) ← θ̃ else: # Feedback-refinement loop pass return θ(S) |
The critic evaluates using a compositional score
with computed by a discriminator and by embedding similarity or regression (Yao et al., 18 Jul 2025).
3. Prompt Engineering and Multi-Turn Control
Agentic scene planning frameworks rely on prompt engineering strategies that break scenario synthesis into explicit phases (PLANNING, EXECUTION, CRITIC) and roles (Planner, Executor, Critic). This separation enables iterative, interpretable scene modification, allows mid-course correction via critic feedback, and supports multi-turn conversations that drive generation toward both realism and high-fidelity intent alignment. Prompts are templated and enforce data and formatting constraints, such as demanding outputs in strict JSON schemas (Yao et al., 18 Jul 2025).
Example:
- User prompt: base scene and intent + “PLANNING” phase
- Assistant (Planner): outputs plan in bullet steps
- User: passes plan + “EXECUTION” phase
- Assistant (Executor): produces updated scene JSON
- User: passes to Critic with “CRITIC” phase
- Assistant (Critic): returns realism/alignment score or feedback
4. Evaluation Methodologies and Empirical Findings
Empirical validation adopts both expert-in-the-loop and automated metrics. For traffic scenario augmentation, evaluation is conducted via:
- Pairwise voting by human autonomy driving experts on realism and intent-alignment
- Elo rating aggregation from head-to-head judgments (e.g., Llama-2–13B: Elo ≈ 1520, GPT-4: Elo ≈ 1550, direct prompt Llama-2–13B: Elo ≈ 1300)
- Planner stress-tests, quantifying increases in risk events (e.g., 30% more risk events for agentically generated scenarios versus manual baselines)
- Hard constraint enforcement, such as collision checks and rule compliance (Yao et al., 18 Jul 2025)
Results demonstrate that the agentic approach with smaller models can match or exceed direct-prompted performance with much larger LLMs, preserving both realism and controllability.
5. Detailed Case Studies: Fine-Grained Scenario Synthesis
Agentic scene planning enables synthesis of challenging, intent-specific scenarios with precise spatiotemporal and behavioral modifications.
Case Study: Unsignalized Bike–Pedestrian Conflict
- Base: Pedestrian waits, bike approaches at 5 m/s.
- Intent: “Cause a near-hit at the crosswalk.”
- Agent plan: Slow bike to 3 m/s at t=0.8s, shift pedestrian start +0.5 m, advance light off 0.2 s earlier.
- Result: Near-miss with bike forced to hard-brake, minimum distance 0.2 m.
Case Study: Sudden Cut-in Maneuver
- Base: Two-lane road, no oncoming vehicles.
- Intent: “Introduce a rapid cut-in engagement.”
- Plan: Insert new vehicle, accelerate lateral motion from 0 to 4 m/s between t=1.2–1.4s, adjust yaw.
- Outcome: Ego-vehicle planner must emergency-brake due to cut-in (Yao et al., 18 Jul 2025).
These examples illustrate agentic control over interaction timing, actor insertion, and intent-specific risk elevation, enabling scenario augmentation beyond rote randomization or static templates.
6. Implications, Limitations, and Future Directions
Agentic scene planning enables high-throughput, ethic-compliant generation of rare and safety-critical scenarios for autonomous vehicle evaluation, reducing the reliance on large-scale real-world collection and manual expert engineering. Key advantages are:
- Fine-grained, interpretable control over scenario properties
- Multi-turn, feedback-taking refinement
- Comparable (or superior) evaluation metrics for smaller, cost-effective LLMs
Identified limitations include:
- The necessity for robust rule-based or learned critics to ensure no distributional or realism drifts
- The requirement to formalize and enforce complex, higher-order constraints (e.g., traffic law compliance, multi-agent anticipation)
- Scalability of intent interpretation and critic evaluation as scene complexity grows
Ongoing research extends agentic planning to multi-object, 3D, and interactive domains—leveraging agentic VLM tool-chains for physical scene synthesis, composite diffusion-based layout planning, and scenario customization in embodied and robotic applications (Fan et al., 24 Sep 2025, Ling et al., 5 May 2025).
7. Relation to Broader Agentic Scene Reasoning Approaches
The agentic scene planning loop is analogous to frameworks in vision-language-driven 3D scene synthesis (Ling et al., 5 May 2025), compositional layout generation (Fan et al., 24 Sep 2025), and simulation-based scenario orchestration with natural language inputs (Jeong et al., 10 Nov 2025). Across domains, the essential characteristics are:
- Explicit modular decomposition (planner, actor, critic/judge)
- Structured, serializable scene states with high semantic and geometric fidelity
- Closed-loop, natural language-driven refinement for intent alignment
- Empirical demonstration of improved controllability and diversity versus end-to-end or monolithic generative models
Agentic scene planning establishes a generalizable framework for controllable, efficient, and intent-aligned scenario synthesis, with ongoing integration into evaluation pipelines for autonomous systems, simulation scenario design, and embodied reasoning tasks (Yao et al., 18 Jul 2025, Fan et al., 24 Sep 2025, Ling et al., 5 May 2025).