Joint Training and Evaluation of World Models in Non-Stationary Environments
Investigate joint training, continual updating, and rigorous evaluation protocols for world models used by large language model-based agents in non-stationary environments, and ascertain the causal impact of these world models on downstream planning reliability.
References
An open problem is how to jointly train, update, and evaluate world models in non-stationary environments, and how to assess their causal impact on downstream planning reliability.
— Agentic Reasoning for Large Language Models
(2601.12538 - Wei et al., 18 Jan 2026) in Section 7.3