Agentic Scene Planning Framework

Updated 22 January 2026

Agentic scene planning is a paradigm that integrates closed-loop reasoning and modular operations (planner, executor, critic) to iteratively refine complex scene representations.
It employs formal scenario structures—using maps, actor trajectories, and time horizons in JSON-like formats—to ensure physical plausibility and strict intent alignment.
The approach leverages prompt engineering and multi-turn feedback for scalable, controlled scenario synthesis, enhancing safety and efficacy in autonomous systems.

Agentic scene planning is a paradigm that introduces closed-loop reasoning, tool use, and natural language intent interpretation into the synthesis, control, and evaluation of complex scenes, with applications ranging from 3D environment creation and traffic scenario generation to robotics and autonomous driving. The agentic approach replaces monolithic or purely data-driven pipelines with multi-module agents—typically orchestrated by LLMs or vision-LLMs (VLMs)—that iterate between planning, acting, auditing, and refining the evolving scene representation. This design enables fine-grained intent alignment, ensures physical and semantic plausibility, and increases scalability for generating diverse, high-challenge, and controllably modifiable scene configurations (Yao et al., 18 Jul 2025).

1. Formal Scenario Representations in Agentic Planning

Agentic scene planning begins by defining structured scene representations compatible with symbolic and machine learning reasoning. In the context of traffic scenario augmentation, a scene is represented as $S = (M, A, T)$ , where $M$ describes the static map (lanes, crosswalks, traffic lights), $A = \{ a_1, \ldots, a_n \}$ encodes actor trajectories $x_i(t) = (p_i(t), v_i(t), \theta_i(t))$ , and $T$ is the time horizon. These representations are encapsulated in JSON-like data structures containing the map, actor states indexed over time, and relevant traffic rules (Yao et al., 18 Jul 2025).

The agentic scene planning operator is a parameterized function $f_\phi(S_0, d) \rightarrow S_a$ , where $S_0$ is a base scene, $d$ is a high-level intent (expressed in natural language), and $S_a$ is the generated scene expected to obey $\text{RealismConstraint}(S_a) \approx 1$ and a maximized intent score $M$ 0.

2. Agentic Architectures: Planner–Executor–Critic Loops

A typical agentic framework features a modular, looped architecture comprising:

Planner: Decomposes the intent $M$ 1 into a sequence of symbolic, high-level editing operations on the current scene (e.g., “advance pedestrian by 1 m”, “reduce lead car’s speed”).
Executor: Realizes each editing step, producing a candidate scene in the prescribed data schema.
Critic: Audits the candidate for both realism (using learned discriminators or rule-based checks) and intent alignment (by embedding-based similarity or regression). Accepts or rejects the modification and provides feedback for refinement.

This process iterates, typically for $M$ 2 steps, until the critic’s composite score $M$ 3 exceeds a threshold $M$ 4, at which point the final scene $M$ 5 (now $M$ 6) is returned. Prompt engineering—phase- and role-separated, templated exchanges—ensures fine-grained control and deterministic, interpretable operation (Yao et al., 18 Jul 2025).

High-level pseudocode: $A = \{ a_1, \ldots, a_n \}$ 0

The critic evaluates using a compositional score

$M$ 7

with $M$ 8 computed by a discriminator and $M$ 9 by embedding similarity or regression (Yao et al., 18 Jul 2025).

3. Prompt Engineering and Multi-Turn Control

Agentic scene planning frameworks rely on prompt engineering strategies that break scenario synthesis into explicit phases (PLANNING, EXECUTION, CRITIC) and roles (Planner, Executor, Critic). This separation enables iterative, interpretable scene modification, allows mid-course correction via critic feedback, and supports multi-turn conversations that drive generation toward both realism and high-fidelity intent alignment. Prompts are templated and enforce data and formatting constraints, such as demanding outputs in strict JSON schemas (Yao et al., 18 Jul 2025).

Example:

User prompt: base scene and intent + “PLANNING” phase
Assistant (Planner): outputs plan in bullet steps
User: passes plan + “EXECUTION” phase
Assistant (Executor): produces updated scene JSON
User: passes to Critic with “CRITIC” phase
Assistant (Critic): returns realism/alignment score or feedback

4. Evaluation Methodologies and Empirical Findings

Empirical validation adopts both expert-in-the-loop and automated metrics. For traffic scenario augmentation, evaluation is conducted via:

Pairwise voting by human autonomy driving experts on realism and intent-alignment
Elo rating aggregation from head-to-head judgments (e.g., Llama-2–13B: Elo ≈ 1520, GPT-4: Elo ≈ 1550, direct prompt Llama-2–13B: Elo ≈ 1300)
Planner stress-tests, quantifying increases in risk events (e.g., 30% more risk events for agentically generated scenarios versus manual baselines)
Hard constraint enforcement, such as collision checks and rule compliance (Yao et al., 18 Jul 2025)

Results demonstrate that the agentic approach with smaller models can match or exceed direct-prompted performance with much larger LLMs, preserving both realism and controllability.

5. Detailed Case Studies: Fine-Grained Scenario Synthesis

Agentic scene planning enables synthesis of challenging, intent-specific scenarios with precise spatiotemporal and behavioral modifications.

Case Study: Unsignalized Bike–Pedestrian Conflict

Base: Pedestrian waits, bike approaches at 5 m/s.
Intent: “Cause a near-hit at the crosswalk.”
Agent plan: Slow bike to 3 m/s at t=0.8s, shift pedestrian start +0.5 m, advance light off 0.2 s earlier.
Result: Near-miss with bike forced to hard-brake, minimum distance 0.2 m.

Case Study: Sudden Cut-in Maneuver

Base: Two-lane road, no oncoming vehicles.
Intent: “Introduce a rapid cut-in engagement.”
Plan: Insert new vehicle, accelerate lateral motion from 0 to 4 m/s between t=1.2–1.4s, adjust yaw.
Outcome: Ego-vehicle planner must emergency-brake due to cut-in (Yao et al., 18 Jul 2025).

These examples illustrate agentic control over interaction timing, actor insertion, and intent-specific risk elevation, enabling scenario augmentation beyond rote randomization or static templates.

6. Implications, Limitations, and Future Directions

Agentic scene planning enables high-throughput, ethic-compliant generation of rare and safety-critical scenarios for autonomous vehicle evaluation, reducing the reliance on large-scale real-world collection and manual expert engineering. Key advantages are:

Fine-grained, interpretable control over scenario properties
Multi-turn, feedback-taking refinement
Comparable (or superior) evaluation metrics for smaller, cost-effective LLMs

Identified limitations include:

The necessity for robust rule-based or learned critics to ensure no distributional or realism drifts
The requirement to formalize and enforce complex, higher-order constraints (e.g., traffic law compliance, multi-agent anticipation)
Scalability of intent interpretation and critic evaluation as scene complexity grows

Ongoing research extends agentic planning to multi-object, 3D, and interactive domains—leveraging agentic VLM tool-chains for physical scene synthesis, composite diffusion-based layout planning, and scenario customization in embodied and robotic applications (Fan et al., 24 Sep 2025, Ling et al., 5 May 2025).

7. Relation to Broader Agentic Scene Reasoning Approaches

The agentic scene planning loop is analogous to frameworks in vision-language-driven 3D scene synthesis (Ling et al., 5 May 2025), compositional layout generation (Fan et al., 24 Sep 2025), and simulation-based scenario orchestration with natural language inputs (Jeong et al., 10 Nov 2025). Across domains, the essential characteristics are:

Explicit modular decomposition (planner, actor, critic/judge)
Structured, serializable scene states with high semantic and geometric fidelity
Closed-loop, natural language-driven refinement for intent alignment
Empirical demonstration of improved controllability and diversity versus end-to-end or monolithic generative models

Agentic scene planning establishes a generalizable framework for controllable, efficient, and intent-aligned scenario synthesis, with ongoing integration into evaluation pipelines for autonomous systems, simulation scenario design, and embodied reasoning tasks (Yao et al., 18 Jul 2025, Fan et al., 24 Sep 2025, Ling et al., 5 May 2025).