Improve transfer of RWML world-model knowledge in weaker base LLMs
Develop training strategies within Reinforcement World Model Learning (RWML) that enable weaker base large language models, such as Qwen2.5-7B, to effectively transfer action-conditioned world-model knowledge learned via sim-to-real gap rewards to downstream decision-making in long-horizon, tool-use environments like T2-Bench. The goal is to mitigate the observed dependence on base model capability by enhancing the transfer ability in weaker models.
References
On the challenging 72 bench, we find the ability to learn and transfer world model knowledge from RWML to decision-making is dependent on the capability of the base model. We leave improving transfer abilities for weaker models to future work.
— Reinforcement World Model Learning for LLM-based Agents
(2602.05842 - Yu et al., 5 Feb 2026) in Section 4.3 (Impact of Base Model Capability)