"Just in Time" World Modeling Supports Human Planning and Reasoning

Published 20 Jan 2026 in cs.AI and q-bio.NC | (2601.14514v1)

Abstract: Probabilistic mental simulation is thought to play a key role in human reasoning, planning, and prediction, yet the demands of simulation in complex environments exceed realistic human capacity limits. A theory with growing evidence is that people simulate using simplified representations of the environment that abstract away from irrelevant details, but it is unclear how people determine these simplifications efficiently. Here, we present a "Just-in-Time" framework for simulation-based reasoning that demonstrates how such representations can be constructed online with minimal added computation. The model uses a tight interleaving of simulation, visual search, and representation modification, with the current simulation guiding where to look and visual search flagging objects that should be encoded for subsequent simulation. Despite only ever encoding a small subset of objects, the model makes high-utility predictions. We find strong empirical support for this account over alternative models in a grid-world planning task and a physical reasoning task across a range of behavioral measures. Together, these results offer a concrete algorithmic account of how people construct reduced representations to support efficient mental simulation.

Abstract PDF Upgrade to Chat

Summary

The paper demonstrates a dynamic framework for online world modeling that incrementally constructs and updates scene representations during simulation.
It shows that the model achieves higher correlation and lower RMSE than pre-computation methods in both planning and physical reasoning tasks.
The approach reduces the number of encoded scene objects by around 50%, highlighting efficiency and behavioral plausibility under cognitive constraints.

"Just-in-Time" World Modeling: A Formal Account of Online Scene Representation in Human Planning and Reasoning

Introduction

The paper "Just in Time" World Modeling Supports Human Planning and Reasoning (2601.14514) articulates a formal framework for understanding how humans efficiently construct and update mental representations when simulating, planning, and reasoning in complex environments. The central thesis posits that people do not precompute globally optimal abstractions of the environment before simulating but instead engage in online, incremental construction of representations as required by the ongoing simulation process—a paradigm termed "Just-in-Time" (JIT) world modeling. This framework reconciles cognitive capacity limitations with high-fidelity predictive performance in tasks spanning both spatial planning and physical reasoning.

Theoretical Motivation and Model Architecture

Simulation-based reasoning has long been implicated as a core mechanism in human cognition, yet the computational intractability of simulating all environmental details has motivated theories of selective abstraction. Prior models frequently rely on pre-planning optimization to select scene construals, with representational choices driven by anticipated utility gains and cognitive costs. However, such strategies create a paradox: evaluating construal utility typically demands full-environment simulations, offsetting the intended computational savings and complicating generalization to novel scenarios.

The JIT model bypasses this paradox by tightly interleaving simulation, visual attention (lookahead), and representation modification. Its architecture consists of three primary modules:

Representational Sketchpad: Stores the set of objects currently encoded for simulation.
Stochastic Simulator: Propagates environmental dynamics with uncertainty, either via a noisy A* search (planning domain) or probabilistic Newtonian physics (physical reasoning).
Perceptual Lookahead: Guides visual attention based on the simulated trajectory, flagging imminent interactions and prompting the inclusion of additional relevant objects.

Crucially, objects are only encoded when their relevance is flagged by impending simulation steps, and memory decay is modeled stochastically to reflect fallible working memory. Representational complexity is thus managed dynamically, conditioned on the simulated trajectory rather than static scene analysis.

Empirical Evaluation: Planning and Physical Reasoning Domains

Planning Tasks

The JIT model was evaluated on classical grid-world navigation tasks, with representations initialized to minimally necessary elements (agent, goal, invariant obstacles). Simulation proceeds via a noisy A* search, with each chosen plan dictating which obstacles are encoded as encountered. The model was compared against the Value Guided Construal (VGC) framework, which performs pre-computation to maximize expected utility under the constraints of memory cost.

Results consistently indicated superior fit of the JIT model to human recall and attention behavior across all planning experiments. For instance, JIT achieved higher correlation coefficients, lower RMSE, and better log-likelihood scores with fewer free parameters compared to VGC (e.g., Experiment 1C: JIT r = 0.95, RMSE = 0.08 vs VGC r = 0.93, RMSE = 0.10; Experiments 1D/E: JIT r = 0.88/0.95, VGC r = 0.65/0.74). Notably, JIT required encoding significantly fewer scene objects (~50% or less, depending on the scenario) without compromising planning performance—a strong quantitative result in representational efficiency.

Physical Reasoning Tasks

In a physical prediction domain, participants predicted the landing position of a ball dropped through arrays of obstacles, and their object memory was probed post hoc. JIT dynamically encoded obstacles as their relevance emerged in simulated noisy trajectories. Model traces predicted both object recall probability and confidence in a manner highly correlated with human data (Experiment 2A: recall r = 0.87, confidence r = 0.96), outperforming VGC (recall r = 0.82, confidence r = 0.90).

A critical dissociation experiment (2B) implemented stimuli where JIT and VGC made opposing predictions—objects contacted frequently by noise-induced trajectories but lacking impact on final outcomes versus objects infrequently contacted but determinant of outcome. Human recall aligned with JIT, signaling the behavioral relevance of frequency-driven inclusion, not mere counterfactual utility.

Mechanistic Implications and Algorithmic Efficiency

The work formally demonstrates that interleaved encoding and simulation yields construals that maximize the utility-complexity tradeoff under realistic cognitive constraints. Algorithmic analyses showed that JIT dominated other models across a broad range of computational and memory cost parameter regimes, excelling whenever planning and representation both incur nontrivial costs.

Unlike VGC, which may encode objects that are only marginally relevant due to their influence on optimality across all possible plans, JIT restricts memory to the currently simulated trajectory, dynamically updating as needed. This produces "trajectory-conditional" representations that are both sparser and more behaviorally plausible, suggesting that offline optimization frameworks may mischaracterize real-world cognition where online uncertainty is primary.

Extensions, Limitations, and Broader Relevance

The demonstrations in this paper focus on visually grounded, static domains with spatial locality and single-target focus. As such, the JIT model abstracts away complexities arising in continuously changing environments, multi-object reasoning, and settings with ambiguous goals. Explicit capacity limits and further mechanisms for initialization in unconstrained scenes remain open for future refinement.

However, the principle of incremental, need-driven representation is extendable to abstract cognitive domains (e.g., logical reasoning, semantic search), and can be leveraged in artificial planning agents. Deferred representation construction offers practical benefits for robotics and AI systems struggling with cluttered, high-dimensional environments. Integration with learned or metacognitive heuristics may enable hybrid models capable of blending JIT construction with strategic prior knowledge, facilitating scalable simulation-based reasoning.

Conclusion

This paper advances a formal, empirically validated account of how humans construct scene representations "just in time" during simulation-based planning and reasoning. The JIT model explains human memory, attention, and prediction more effectively and more efficiently than pre-computation or optimization-based construal selection. Its implications extend to both theories of cognition and the design of resource-efficient artificial agents. Future work should explore the broad applicability and integration of incremental representation building in dynamic, multi-faceted real-world environments and AI systems (2601.14514).

Markdown Report Issue

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Paper Prompts

Top Community Prompts

Explain it Like I'm 14

off on

Knowledge Gaps

off on

Practical Applications

off on

Glossary

off on

Conceptual Simplification

off on

Explain it Like I'm 14

A simple guide to “Just-in-Time World Modeling”

Overview: What is this paper about?

This paper asks a big question: how do people plan and reason in busy, detailed worlds without overloading their brains? The authors propose a simple idea called “Just-in-Time” (JIT) world modeling. Instead of trying to remember everything at once, JIT says we add details to our mental picture only when we need them—right before they matter. The paper shows that this “add details as you go” strategy helps people plan paths in mazes and predict where a falling ball will go, and it matches how people actually look at and remember things.

Key goals and questions

To make the main ideas clear, here are the simple questions the paper tries to answer:

How do people decide which parts of a scene to pay attention to when planning or predicting?
Can a “just add what’s needed, just in time” approach explain human behavior better than models that try to plan everything up front?
Does this approach work in both maze navigation (planning) and physics (predicting a ball’s path)?
Is this approach efficient—using less memory and computation while still making good predictions?

Methods: How does JIT work?

Think of JIT like playing a game where you plan as you go:

A small “sketchpad” memory: You start with only the essentials (for example, your character and the goal) in a simple note-taking space in your mind. You don’t write down everything in the scene.
A “simulator” that imagines what happens next: You mentally try out the next steps (like the next move in a maze, or the ball’s next bounce).
A “lookahead” that guides your eyes: As you imagine the next steps, you look ahead along the path—like tracing the route with your eyes—to spot upcoming obstacles. If something might interact soon (like a wall blocking your planned step, or a bumper near the ball’s path), you add that object to your sketchpad.
Forget what’s no longer needed: If an object hasn’t mattered for a while, you gradually stop keeping it in mind. This avoids clutter.

The process repeats: simulate a bit, look ahead, add relevant objects, and drop old ones. That’s why it’s called “just in time”—you only add details when they’re about to matter.

The team tested JIT in two domains:

Planning in a grid-world maze: Start at a blue circle, reach a green square, with walls blocking you. The model plans paths with a slightly “noisy” version of a standard pathfinding algorithm (think: it usually prefers shorter, straighter paths, but tries variations).
Physical prediction (“Plinko”): A ball falls through pegs and bumpers to the ground. The model predicts where the ball lands using a physics simulator with some randomness (because humans don’t calculate physics perfectly).

They compared JIT to another model called Value-Guided Construal (VGC). VGC tries to pick the “best” set of objects to represent before planning, balancing usefulness against complexity. JIT builds that set step-by-step during the simulation instead of deciding everything up front.

Main findings: What did they discover?

In both tasks, JIT matched human behavior very well—and often better than the VGC model.

Planning (maze) results:
- When participants had to reveal hidden walls by hovering a mouse over them, JIT predicted which walls people would reveal more accurately than VGC.
- JIT also matched how well people remembered specific walls after planning.
- JIT typically used fewer objects in memory than VGC, yet still made good plans. That means it was efficient.
Physical prediction (ball falling) results:
- When people predicted where the ball would land and later were tested on object memory, JIT closely matched which objects people remembered and how confident they were.
- In a special test, the authors created two types of objects:
- “Counterfactually relevant” objects: These sometimes change the final outcome, but the ball hits them only about half the time.
- “Counterfactually irrelevant” objects: The ball often hits these, but they don’t change the final bucket the ball lands in.
- VGC predicted people would remember the “relevant” objects more. JIT predicted people would remember the “irrelevant but frequently contacted” objects more.
- People matched JIT: they remembered frequently contacted objects—even when those didn’t change the final outcome. This supports the idea that we represent things we expect to need soon, not just things that change the end result.

Why is this important? It shows that people don’t build a full, perfect plan up front. Instead, we plan and predict by adding pieces only when we need them. This keeps our mental workload small while staying accurate enough to get good results.

Implications: What does this mean for the future?

The JIT approach suggests a practical way minds (and machines) can handle complex worlds:

For human cognition: It offers a concrete, step-by-step model of how we think—interleaving imagining, looking, and updating memory. It explains why our attention and memory focus on what’s coming next along the imagined path.
For AI and robots: JIT can help build smarter, more efficient systems that plan and reason without needing to process every single detail. Robots could plan routes or actions by only representing nearby obstacles and adding information as needed, saving time and memory.
For broader reasoning: The same idea—build a small model just in time—could apply beyond vision and physics, like solving multi-step problems or searching memory for related ideas.

A note on limits: The tasks here were simpler than real life (static scenes, one main moving object). In everyday environments, we’ll need to combine JIT with prior knowledge and better ways to initialize what to represent. Still, this paper provides a strong, testable foundation: a simple strategy that helps people think efficiently in complex situations.

View Paper Prompt View All Prompts

Knowledge Gaps

Knowledge gaps, limitations, and open questions

Below is a consolidated list of what remains uncertain or unexplored, framed to guide concrete follow-up research.

Static, visually persistent scenes: JIT was only tested where the full scene is continuously visible and static; its performance under partial observability, occlusions, or dynamically changing environments (moving obstacles, transient cues) is unknown.
Single-target scope: All tasks involve tracking a single focal object; it is unclear how JIT scales to multiple concurrently relevant objects, multi-ball physics, or multi-agent planning where interactions are interdependent.
Assumption of spatial locality: JIT relies on local interactions near a simulated path; tasks with non-local dependencies (e.g., long-range forces, global constraints, procedural rules) may break this assumption—how to extend JIT to such cases is open.
One-step lookahead: Planning used only a one-step collision check; the benefits/costs of deeper lookahead horizons, adaptive horizon selection, or learned horizon control are not studied.
External scene as “free” memory: The computational and cognitive costs of perceptual lookahead (visual search, saccades, verification) are not included in the cost function; quantifying and incorporating these costs could change the efficiency ranking.
Fixed foveal radius: Physical lookahead uses a fixed radius; sensitivity analyses and individualized or context-dependent fovea/attention models (eccentricity, crowding, peripheral vision) are missing.
No explicit capacity limits: Forgetting is modeled by a power-law decay, but there are no capacity/slot constraints or interference mechanisms; testing JIT under explicit memory load manipulations or high-clutter scenes is needed.
Memory dynamics beyond decay: Effects of similarity-based interference, proactive/retroactive interference, rehearsal, chunking, or priority-based retention are not modeled; do these factors change which objects are retained/dropped?
Initialization in arbitrary tasks: Representations are hand-initialized with target/goal and invariant features; how agents discover relevant targets and seed construals in novel, ambiguous, or instruction-free scenes remains open.
Eye-movement validation: The assumption that gaze follows the simulated trajectory is not empirically validated here; direct eye-tracking during prediction/planning would test the lookahead mechanism and its timing.
Straight-line bias and motor assumptions: The A* straight-line tie-break bias is hard-coded; assessing whether results depend critically on this choice or generalize to other motor cost models is needed.
Outcome vs. memory fits: In the physics tasks, parameters are mainly fit to memory probes; fits to full predictive distributions (where people think the ball will land) and trial-level prediction accuracy/RTs are not reported.
Cost function completeness: Compute cost uses “nodes expanded” and representation cost uses “#objects encoded”; the cost of lookahead scans, encoding/decoding operations, and forgetting updates is omitted and may alter conclusions.
Sample efficiency and time pressure: JIT aggregates many trajectories, but the sample budget and speed–accuracy trade-offs are unspecified; behavior under strict time constraints or limited sampling is unknown.
Robustness to adversarial worlds: JIT may be misled by frequent but outcome-irrelevant events (“tunnel vision”); systematic stress tests where frequent intermediate contacts conflict with rare high-impact obstacles are needed.
Safety/risk sensitivity: JIT optimizes expected utility; how to adapt it for risk-sensitive settings where low-probability, high-cost events must be represented early is an open design question.
Hybridization with precomputed construals: The paper argues for combining JIT with learned/prior construals in familiar environments; concrete algorithms for such hybrids and criteria for switching are not developed.
Generalization beyond 2D rigid-body physics: Extensions to 3D scenes, non-rigid bodies, frictional/rotational dynamics, or fluid interactions are untested; does the simple lookahead scale?
Multi-agent and social reasoning: It is unknown how to apply JIT to settings where other agents’ actions depend on inferred beliefs/goals (e.g., game-like planning, social physics).
Moving, partially observable obstacles: How JIT re-plans with moving obstacles under limited sensing and delayed updates (sensor latencies, occlusions) is not evaluated.
Parameter identifiability and individual differences: Parameters are fit at the experiment level; fitting per participant and relating parameters to working memory/attention capacity could test psychological realism and identifiability.
Fairness of VGC comparison: The VGC baseline was adapted (heuristic search, cost definitions) and compared under specific assumptions; re-running with VGC’s native compute metric and additional baselines (e.g., amortized/learned construals, meta-RL) would strengthen claims.
Learning the lookahead policy: The lookahead is hand-specified (follow trajectory, radial spotlight); learning attention/scan policies from data (supervised or reinforcement learning) could improve realism and test optimality.
Need probability vs. utility: The central JIT claim (encode by expected “need”) contradicts utility-based selection in some regimes; broader task batteries are needed to map the boundary conditions where each principle wins.
Estimating human cost trade-offs: The compute and representation cost weights (α, β) are treated as free scenario parameters; empirically estimating these from human behavior would ground the efficiency analysis.
Segmentation and perceptual uncertainty: JIT assumes reliable object-centric segmentation and perfect knowledge of static layout; performance under noisy detection, ambiguous boundaries, or clutter-induced segmentation errors is unknown.
Memory probe methodology: The recall metric (“any evidence” of correctness) may be coarse; adopting signal-detection analyses, confidence calibration, and probe timing manipulations could better dissociate encoding strength vs. decision noise.
Process measures beyond hover: Hover data capture a subset of attention dynamics; integrating latency, dwell time, scanpath structure, and micro-saccades could more precisely test JIT’s interleaving of simulation and encoding.
Scaling with clutter: The model’s behavior with hundreds/thousands of potential objects is unclear; benchmarking under systematically increasing clutter and measuring failure points would inform scalability.
Interaction with information-gathering actions: JIT currently assumes passive lookahead; extending to active information acquisition (e.g., peeking behind occluders, moving to probe) would test JIT in POMDP-like tasks.
Real-world robotics validation: Claims about robotics applicability are speculative; implementing JIT in robot motion planning/manipulation and measuring compute, latency, and success in clutter would test external validity.
Cross-domain transfer: Parameters fitted in one physics experiment transfer to a specific follow-up; broader cross-task/domain transfer (e.g., from planning to physics or vice versa) is untested.
Neural predictions: The framework references distinct brain areas but makes no explicit neural predictions; linking JIT stages (simulate, lookahead, encode/forget) to neural dynamics/oscillations would enable neuroscientific tests.
Exploration–exploitation trade-offs: How JIT balances committing to an initial plan vs. exploring alternatives is not formalized; integrating principled exploration (e.g., Thompson sampling over plans) may reduce myopic failures.
Calibration to outcome granularity: Experiment 2B bucketized outcomes to dissociate JIT vs. VGC; systematic variation of outcome granularity could map when counterfactual relevance dominates frequency and vice versa.
Formal guarantees: There are no performance or regret bounds for JIT’s online representation updates; developing theoretical guarantees (or impossibility results) under task assumptions would clarify limits.

View Paper Prompt View All Prompts

Practical Applications

Immediate Applications

Below are applications that can be deployed now, building directly on the paper’s findings and methods, especially in visually grounded, mostly static environments with local interactions and a single focal object.

Robotics and automation (warehouse, service robots)
- Use case: JIT motion planning to reduce compute and memory by interleaving path simulation, local sensor-based lookahead, and incremental obstacle encoding (e.g., forklifts or mobile robots navigating cluttered aisles).
- Tools/products/workflows: A ROS/MoveIt plugin implementing “JIT Representation Planning” (softmax A* with straight-line bias, one-step collision lookahead, and probabilistic memory culling).
- Assumptions/dependencies: Scenes are mostly static; interactions are spatially local; perception can reliably flag imminent collisions; safety layers (e.g., conservative collision margins) are in place.
Game development and simulation (software)
- Use case: NPC pathfinding and physics prediction that culls irrelevant scene elements until needed, reducing CPU/GPU load in grid/pathfinding and 2D physics-heavy scenes.
- Tools/products/workflows: Unity/Unreal packages that wrap existing A*/physics engines (e.g., Pymunk, PhysX) with JIT simulation, local lookahead, and memory decay policies.
- Assumptions/dependencies: Engine support for probabilistic planners and noisy physics parameters; developer control over attention/lookahead radius; primarily static props or controlled dynamics.
Human–computer interaction and user research
- Use case: Process-tracing tools that mirror the paper’s hover/attention heatmaps to diagnose cognitive effort and guide interface simplification (e.g., planners, dashboards).
- Tools/products/workflows: Web UX kits that instrument “progressive reveal” planning tasks, capturing hover probabilities and correlating with JIT-based representational weights.
- Assumptions/dependencies: Tasks involve visually persistent displays; users can offload memory by revisiting the scene; attention measures are valid proxies for representation.
Cognitive science and AI research instrumentation (academia)
- Use case: Experimental paradigms to measure and model how people build construals on the fly in planning and physics tasks; benchmarking resource–accuracy trade-offs.
- Tools/products/workflows: Open-source Python library implementing JIT world modeling with Monte Carlo trajectories, lookahead queries, and fit procedures (noise parameters, decay).
- Assumptions/dependencies: Experimental tasks maintain the paper’s conditions (local interactions, single target, static scenes); model fit quality depends on calibrated simulator noise.
Data labeling and scene annotation (software/ML ops)
- Use case: Active annotation systems that flag and encode only near-term relevant objects along simulated paths (e.g., obstacle datasets, physics contact maps).
- Tools/products/workflows: Annotation UIs that run fast JIT trajectories to prioritize bounding boxes or contact points likely to affect near-term simulation.
- Assumptions/dependencies: Access to scene geometry and approximate dynamics; annotators accept incremental reveal workflows.
Training and skill acquisition (daily life, sports)
- Use case: Instructional techniques that teach people to “trace imagined trajectories” and visually scan just ahead (e.g., billiards, cycling through clutter).
- Tools/products/workflows: Coaching aids or AR overlays that prompt local lookahead and incremental encoding of obstacles along the imagined path.
- Assumptions/dependencies: Environments are visible and slow-changing; learners can execute gaze-aligned lookahead; safety-supervised practice.

Long-Term Applications

Below are applications that require further research, scaling, or integration with dynamic perception, multi-agent reasoning, and safety guarantees.

Autonomous driving and mobile autonomy (transportation)
- Use case: JIT representation building for low-latency planning in complex urban scenes, representing only imminent hazards and relevant road elements as the vehicle simulates near-term trajectories.
- Tools/products/workflows: Hybrid planners that combine learned heuristics with JIT lookahead and memory culling; safety-wrapped with formal verification of collision avoidance.
- Assumptions/dependencies: Robust perception in dynamic multi-agent environments; high-confidence lookahead under motion; regulatory certification and extensive validation.
General-purpose household and industrial robots (robotics)
- Use case: Robots that construct task-specific simplified world models while acting, blending JIT with prior knowledge in familiar spaces (hybrid JIT–VGC).
- Tools/products/workflows: Cognitive planning stacks that interleave subgoal simulation, local attention, and representation culling across manipulation and navigation.
- Assumptions/dependencies: Reliable discovery of task-relevant objects; memory management under larger object sets; behavior generalization beyond single focal targets.
Surgical planning and technician guidance (healthcare, maintenance)
- Use case: AR co-planning assistants that guide local lookahead (e.g., instrument paths or component access), encoding only imminent obstructions or critical contacts.
- Tools/products/workflows: Head-mounted displays showing just-in-time obstacle flags and subgoal steps along planned trajectories.
- Assumptions/dependencies: High-precision scene understanding; liability and safety constraints; user acceptance and regulatory approval.
XR/AR training ecosystems (education)
- Use case: Tutors that teach and assess just-in-time reasoning strategies across physics, logic puzzles, and route planning, measuring attention and memory traces as learning signals.
- Tools/products/workflows: Adaptive curricula that scaffold local simulation and representation building; analytics linking recall/attention to mastery.
- Assumptions/dependencies: Demonstrated transfer to complex, dynamic tasks; content alignment with educational standards; privacy-compliant data collection.
AI cognitive architectures and software agents (software/AI)
- Use case: JIT memory management for planning/Reasoning in large agents (e.g., LLM planners), keeping only “need-probable” facts and culling stale context to reduce compute.
- Tools/products/workflows: Agent frameworks with incremental subproblem construction, local search-based lookahead, and probabilistic memory decay tuned to task utility.
- Assumptions/dependencies: Effective mapping of visual JIT concepts to abstract domains; benchmarks demonstrating accuracy–compute trade-off improvements; robust retrieval to re-encode forgotten but needed facts.
Semantic retrieval and knowledge foraging (information systems)
- Use case: Search systems that follow local trajectories in concept space, encoding only immediately useful items and pruning stale context to maintain focus.
- Tools/products/workflows: Incremental retrieval pipelines that interleave query expansion with attention-guided inclusion of results.
- Assumptions/dependencies: Reliable “local lookahead” in semantic embeddings; user-tolerable latency; success metrics aligned to utility beyond relevance.
Sustainability and policy for compute-aware planning (policy, energy)
- Use case: Guidelines encouraging resource-rational planning (compute and memory budgets) in public sector robotics or AI deployments.
- Tools/products/workflows: Procurement standards and evaluation rubrics that measure representation and compute costs alongside task performance.
- Assumptions/dependencies: Clear measurement frameworks; stakeholder consensus on trade-offs; case studies showing energy savings without safety loss.
Hybrid JIT–VGC planners for familiar environments (cross-sector)
- Use case: Systems that pre-compute efficient construals in known contexts and switch to JIT in novel or cluttered scenarios (e.g., hometown routing vs. new city navigation).
- Tools/products/workflows: Meta-planning modules that learn when to use prior construals vs. incremental construction, informed by task familiarity and uncertainty.
- Assumptions/dependencies: Reliable familiarity detection; learning signals from past tasks; methods to reconcile discrepancies between precomputed and JIT representations.

Notes on key assumptions across applications:

The strongest empirical support is in visually persistent, mostly static scenes with local interactions and a single focal object; moving to dynamic, multi-object, multi-agent contexts requires additional research and engineering.
JIT depends on accurate perceptual lookahead and the ability to query the external scene on demand; failures in perception or occlusion can degrade performance.
Memory decay was modeled simply; real deployments may need explicit capacity limits and more sophisticated forgetting policies.
Safety-critical deployments (e.g., autonomous driving, surgery) will require conservative lookahead, formal safety guarantees, and extensive testing.

View Paper Prompt View All Prompts

Glossary

A* search algorithm: A heuristic graph search algorithm that finds paths efficiently by combining actual and estimated costs. "In the planning domain, simulation is implemented as a stochastic variant of the A* search algorithm (Zhi-Xuan et al., 2020)"
algorithmic utility: A quantitative measure that trades off plan quality with computational and representational costs. "we compute an algorithmic utility V that, given an algorithm A, trades off the utility of planning, the complexity of representation, and the computational cost of planning"
branching factor: The number of possible next states or actions considered at each decision point, affecting search complexity. "therefore reducing the branching factor of planning."
collision restitution: A parameter controlling how bouncy a collision is, affecting post-collision velocities. "randomly perturb the collision restitution, by sampling from a truncated normal distribution with variance s2."
counterfactually irrelevant objects: Objects that are contacted but do not change the final outcome under noisy simulation. "“Counterfactually Irrelevant” objects are contacted by the ball but made no difference to the bucket the ball would land in (Fig. 6B)."
counterfactually relevant objects: Objects that can change the final prediction but are only contacted in some simulated trajectories. "“Counterfactually Relevant” objects make a large difference to the final prediction, but are contacted only half of the time under noisy simulation (Fig. 6A)"
construal: A reduced representation of the environment that includes only the objects relevant for the simulation or plan. "The outcome of this iterative process is a construal of the scene: the final contents of the representational sketchpad including the set of objects whose effects are modeled in simulation"
decay parameter y: A parameter controlling the rate at which previously encoded objects are forgotten over simulation steps. "controlled by a decay parameter y (p(forget object o) x t-Y, where t is the number of steps since object o has been encoded)."
external memory: The view that the environment can serve as an accessible memory store via attention. "accounts of visual attention that describe the environment as a kind of external memory (O'Regan, 1992)"
Gaussian distribution: A normal distribution used to model noise in initial conditions. "We first apply a random shift to the ball's initial x position, based on a Gaussian distribution centered at the ball with variance o2."
grid world: A discrete, grid-like environment used for navigation and planning tasks. "An agent is placed in a grid world and must move their avatar to a green goal."
grid-search procedure: An exhaustive parameter search over predefined ranges to optimize model fit. "We choose these parameters through a grid-search procedure to maximize the Pearson correlation coefficient r"
heuristic search: Search guided by heuristics rather than exhaustive enumeration to improve efficiency. "a modified VGC model adapted to use heuristic search instead of policy iteration"
log-likelihood: A statistical measure of model fit that sums the logarithms of likelihoods of observed data under the model. "In experiment 1C, the JIT model has a higher correlation, lower RMSE, and higher log-likelihood than VGC"
lookahead: The process of anticipating imminent interactions by scanning the scene along the simulated trajectory. "This allows for efficient lookahead, which only needs to consider a limited set of prospective objects that are close to the simulation trajectory."
lookahead function l(s): A function that flags objects likely to interact with the next step of the simulation or plan. "the lookahead function l flags objects only if they intersect the next proposed step of the plan:"
Manhattan distance heuristic: An L1-distance heuristic used in grid-based path planning. "The A* planner uses a Manhattan distance heuristic and a straight line tiebreak bias"
mixed effects model: A statistical model including both fixed and random effects to account for group and individual variability. "a mixed effects model regressing object type on memory with random intercepts by participants showed a statistically significant effect"
Monte Carlo simulation: A method that uses repeated random sampling to estimate outcomes or expected values. "we use the average construal as computed using Monte Carlo simulation"
need probability: The probability that an object will be required by the simulation, guiding representational strength. "JIT predicts that objects should be represented with a strength proportional to “need probability” (Anderson & Milson, 1989)"
norming experiment: A preliminary study used to calibrate parameters or validate stimuli before main experiments. "We fit the noise parameters to a separate norming experiment (see Supplementary Section S1.3)"
perceptual lookahead module: A component that directs visual search based on the current simulated state to find relevant objects. "we assume a perceptual lookahead module that uses the current state of the simulation to guide a local visual search"
policy iteration: A dynamic programming method for improving policies in reinforcement learning. "a modified VGC model adapted to use heuristic search instead of policy iteration"
probabilistic mental simulation: Simulating future states while explicitly modeling uncertainty. "Probabilistic mental simulation is thought to play a key role in human reasoning, planning, and prediction"
probabilistic physics simulation engine: A physics simulator that incorporates stochasticity to capture uncertainty in physical interactions. "simulation is implemented as a probabilistic physics simulation engine (Battaglia et al., 2013)"
Pymunk 2d physics engine: A specific 2D physics library used to compute physical dynamics. "as calculated by the Pymunk 2d physics engine (Blomqvist, 2007)."
representational sketchpad: A working-memory structure that stores object-centric information for simulation. "First is a representational sketchpad that contains object-centric information to support simulation."
representation culling: The process of removing previously encoded but currently irrelevant objects from the representation. "Then, the representation is culled, probabilistically forgetting objects that have not been relevant for some time."
representational weight: The strength or probability with which an object is included in the working representation. "representational weight JIT and VGC assign to each object."
resource efficiency measure: A metric assessing models by combining costs of computation and representation with plan utility. "We calculate a resource efficiency measure for four models"
resource-rational principles: The idea that cognition optimizes performance under constraints on time, memory, and computation. "both models approach the construction of a construal from resource-rational principles"
RMSE: Root Mean Square Error, a measure of prediction error magnitude. "the JIT model has a higher correlation, lower RMSE, and higher log-likelihood than VGC (JIT: r = 0.95, RMSE = 0.08, LL: - 1, 763; VGC: r = 0.93, RMSE = 0.10, LL: - 1,809)"
softmax choice rule: A probabilistic selection rule that chooses among options with probability proportional to exponentiated utilities. "we instead expand nodes using the softmax choice rule:"
state space: The set of all possible states the system can be in during simulation or planning. "we must specify the state space S"
stochastic variant: A version of an algorithm or simulator that includes randomness or noise. "simulation is implemented as a stochastic variant of the A* search algorithm (Zhi-Xuan et al., 2020)"
straight line tiebreak bias: A preference to continue planning in a straight line when multiple options are equally good. "and a straight line tiebreak bias, such that when reconstructing a plan from the sequence of visited states, the planner will favor plans that travel in a straight line."
transition function: A function that defines how the system evolves from one state to the next, possibly given an action. "the dynamics of the environment, encoded in a transition function f"
truncated normal distribution: A normal distribution restricted to a range, used to model bounded noise. "by sampling from a truncated normal distribution with variance s2."
value-guided construal (VGC): A model that optimizes the utility of planning while minimizing representational complexity. "We compare our JIT model to the Value Guided Construal (VGC) model"
visual foveation: Directing gaze to the fovea region to extract high-acuity information for lookahead. "cheap and effective implementations of lookahead through visual foveation."
Von Mises distribution: A probability distribution on angles used to model directional noise. "drawing a rotation angle from a Von Mises distribution centered at 0 and with variance K"

"Just in Time" World Modeling Supports Human Planning and Reasoning

Summary

"Just-in-Time" World Modeling: A Formal Account of Online Scene Representation in Human Planning and Reasoning

Introduction

Theoretical Motivation and Model Architecture

Empirical Evaluation: Planning and Physical Reasoning Domains

Planning Tasks

Physical Reasoning Tasks

Mechanistic Implications and Algorithmic Efficiency

Extensions, Limitations, and Broader Relevance

Conclusion

Paper to Video (Beta)

Whiteboard

Paper Prompts

Top Community Prompts

Explain it Like I'm 14

A simple guide to “Just-in-Time World Modeling”

Overview: What is this paper about?

Key goals and questions

Methods: How does JIT work?

Main findings: What did they discover?

Implications: What does this mean for the future?

Knowledge Gaps

Knowledge gaps, limitations, and open questions

Practical Applications

Immediate Applications

Long-Term Applications

Glossary

Open Problems

Continue Learning

Authors (5)

Collections

Tweets

"Just in Time" World Modeling Supports Human Planning and Reasoning

Summary

"Just-in-Time" World Modeling: A Formal Account of Online Scene Representation in Human Planning and Reasoning

Introduction

Theoretical Motivation and Model Architecture

Empirical Evaluation: Planning and Physical Reasoning Domains

Planning Tasks

Physical Reasoning Tasks

Mechanistic Implications and Algorithmic Efficiency

Extensions, Limitations, and Broader Relevance

Conclusion

Paper to Video (Beta)

Whiteboard

Paper Prompts

Top Community Prompts

Explain it Like I'm 14

A simple guide to “Just-in-Time World Modeling”

Overview: What is this paper about?

Key goals and questions

Methods: How does JIT work?

Main findings: What did they discover?

Implications: What does this mean for the future?

Knowledge Gaps

Knowledge gaps, limitations, and open questions

Practical Applications

Immediate Applications

Long-Term Applications

Glossary

Open Problems

Continue Learning

Related Papers

Authors (5)

Collections

Tweets