Monte Carlo Planning with Large Language Model for Text-Based Game Agents

Published 23 Apr 2025 in cs.CL | (2504.16855v1)

Abstract: Text-based games provide valuable environments for language-based autonomous agents. However, planning-then-learning paradigms, such as those combining Monte Carlo Tree Search (MCTS) and reinforcement learning (RL), are notably time-consuming due to extensive iterations. Additionally, these algorithms perform uncertainty-driven exploration but lack language understanding and reasoning abilities. In this paper, we introduce the Monte Carlo planning with Dynamic Memory-guided LLM (MC-DML) algorithm. MC-DML leverages the language understanding and reasoning capabilities of LLMs alongside the exploratory advantages of tree search algorithms. Specifically, we enhance LLMs with in-trial and cross-trial memory mechanisms, enabling them to learn from past experiences and dynamically adjust action evaluations during planning. We conduct experiments on a series of text-based games from the Jericho benchmark. Our results demonstrate that the MC-DML algorithm significantly enhances performance across various games at the initial planning phase, outperforming strong contemporary methods that require multiple iterations. This demonstrates the effectiveness of our algorithm, paving the way for more efficient language-grounded planning in complex environments.

Abstract PDF Upgrade to Chat

Summary

The paper presents the MC-DML algorithm, which integrates LLMs with Monte Carlo planning to enhance exploration in text-based game environments.
It employs dual memory mechanisms—in-trial and cross-trial—to prioritize actions based on current trajectories and historical experiences.
Empirical results show MC-DML outperforms traditional MCTS and RL methods on the Jericho benchmark, notably in games like Zork1 and Deephome.

Monte Carlo Planning with LLM for Text-Based Game Agents

Introduction

The paper presents the Monte Carlo planning with Dynamic Memory-guided LLM (MC-DML) algorithm, designed to tackle the challenges of text-based games on the Jericho benchmark. These games demand language understanding and reasoning capabilities due to their dynamic state spaces and sparse rewards. Traditional approaches, such as Monte Carlo Tree Search (MCTS) and reinforcement learning (RL), face limitations in this context due to extensive iterations and insufficient language capabilities. MC-DML aims to merge the language prowess of LLMs with the exploratory strengths of tree search algorithms, introducing in-trial and cross-trial memory mechanisms that empower the LLMs to learn from past experiences.

Methodology

MC-DML integrates the capabilities of LLMs into traditional MCTS by employing LLMs as prior policies. Throughout the expanded phases of selection, expansion, simulation, and backpropagation, the LLM assigns non-uniform search priorities to candidate actions based on its in-trial memory (the agent's current trajectory) and cross-trial memory (historical experiences from previous failures). This dual-memory setup allows the LLM to dynamically adapt its action evaluations during the simulation.

The MC-DML algorithm operates by aligning more closely with human reasoning, where in-trial memory reflects recent experiences and cross-trial memory leverages insights learned from prior mistakes. This design enables the LLM to refine its action-value estimates continually, aiming to improve efficiency in sparse or complex game environments.

Figure 1: An example bottleneck state from the game Zork1.

Results

Empirical results from experiments conducted on text-based games in the Jericho benchmark show that MC-DML substantially enhances initial planning performance compared to other methods that require multiple iterations for policy optimization. Its innovative memory mechanism significantly outperforms contemporary RL-based and MCTS-based approaches, performing particularly well in complex environments like Zork1 and Deephome.

Comparison with Prior Algorithms

MC-DML was compared against multiple baselines. Traditional MCTS methods like PUCT and approaches such as MC-LAVE-RL, which incorporate value sharing among actions, were outperformed by MC-DML across 8 out of 9 games. By prioritizing exploration and leveraging memory mechanisms, MC-DML addresses the shortcomings of static policy learning in MCTS, providing a more robust solution for language-driven decision-making challenges.

Figure 2: A comparison of the PUCT and MC-DML algorithms. PUCT trains its policy through imitation learning from self-play data. In contrast, MC-DML uses a LLM as the initial policy. During planning, the LLM learns from past failure trajectories and adjusts the action value estimates. This approach more closely aligns with the human thought process.

Technical Implementation and Challenges

The algorithm was implemented using the gpt-3.5-turbo LLM, meticulously tuned to achieve optimal balance between exploration and exploitation using estimated Q-values and memory-guided adjustments. However, the cross-trial memory relies on curated reflections from failed trajectories, emphasizing the pivotal role of effective feedback mechanisms.

The flexibility of MC-DML to dynamically adjust the state-action evaluations marks an advancement in its ability to mimic human-like strategic depth and planning adaptability. Nevertheless, practical challenges exist, such as managing the LLM's input constraints and ensuring relevance in the stored memory, particularly in games where decisions in early states have significant subsequent ramifications.

Conclusion

MC-DML demonstrates a significant step forward in integrating LLMs into Monte Carlo planning for enhanced action exploration in text-based games. Its dual-memory approach allows for dynamic learning and illuminated strategies that align closely with natural human decision processes. Future work could explore optimizing memory management techniques and expanding to other complex decision-making tasks requiring intricate language understanding combined with strategic planning capabilities.

Markdown Report Issue