Plan-and-Act: Improving Planning of Agents for Long-Horizon Tasks

Published 12 Mar 2025 in cs.CL | (2503.09572v3)

Abstract: LLMs have shown remarkable advancements in enabling language agents to tackle simple tasks. However, applying them for complex, multi-step, long-horizon tasks remains a challenge. Recent work have found success by separating high-level planning from low-level execution, which enables the model to effectively balance high-level planning objectives and low-level execution details. However, generating accurate plans remains difficult since LLMs are not inherently trained for this task. To address this, we propose Plan-and-Act, a novel framework that incorporates explicit planning into LLM-based agents and introduces a scalable method to enhance plan generation through a novel synthetic data generation method. Plan-and-Act consists of a Planner model which generates structured, high-level plans to achieve user goals, and an Executor model that translates these plans into environment-specific actions. To train the Planner effectively, we introduce a synthetic data generation method that annotates ground-truth trajectories with feasible plans, augmented with diverse and extensive examples to enhance generalization. We evaluate Plan-and-Act using web navigation as a representative long-horizon planning environment, demonstrating a state-of-the-art 57.58% success rate on the WebArena-Lite benchmark as well as a text-only state-of-the-art 81.36% success rate on WebVoyager.

Abstract PDF Upgrade to Chat

Summary

The paper presents a Plan-and-Act framework that leverages synthetic data and dynamic replanning to improve planning in LLM-based agents.
It employs a dual-module system where a Planner generates high-level strategies and an Executor translates these plans into adaptive actions.
Evaluation on the WebArena-Lite benchmark achieved a 53.94% success rate, highlighting its superior performance over traditional methods.

Plan-and-Act: Improving Planning of Agents for Long-Horizon Tasks

Introduction

The "Plan-and-Act" paper introduces a framework aimed at enhancing the planning capabilities of LLM-based agents for complex tasks. The framework encapsulates high-level planning and execution into distinct components— the Planner and Executor—specifically designed to address challenges in multi-step, long-horizon tasks. Recognizing that LLMs struggle with precise plan generation due to a lack of inherent training in explicit planning, the authors propose synthetic data generation to provide extensive examples for training the Planner.

Figure 1: Plan-and-Act System Diagram. The Planner processes the user query to generate a high-level plan for the Executor to implement.

System Architecture

The Plan-and-Act system segregates the responsibilities of task planning and execution into two modules:

Planner: This module formulates structured, high-level strategies to accomplish the specified user tasks. It benefits from a synthetic data generation methodology that accurately assigns ground-truth task annotations, improving plan generation.
Executor: This component translates the plans generated by the Planner into executable actions within the environment, adapting to the dynamic nature of task variables and environment changes using real-time feedback.

The novel feature of the framework is the ability to invoke Dynamic Replanning, which regenerates plans as the environment changes, enabling adaptability and resilience to unforeseen shifts or failures in initial task execution.

Synthetic Data Generation

Action Trajectory Generation

To alleviate the constraints imposed by limited real-world action trajectory data, a scalable synthetic data pipeline is used. This includes generating potential user queries and collecting trajectories rated by an outcome-supervised reward model. The generated data are filtered to ensure that only successful trajectories are utilized in training.

Figure 2: Synthetic Data Generation Pipeline. Shows stages involved in generating and annotating data for training.

Grounded Plan Generation

This process involves reverse-engineering actions from executed trajectories to synthesize structured plans grounded in the actual task environment. It ensures the proposed plans are executable and relevant to the context of execution.

Plan Expansion and Augmentation

The framework extends the planner’s dataset through synthetic augmentation techniques, employing the context-specific patterns identified during data creation to generate additional data samples. This expansion supplements the original datasets, overcoming data scarcity by increasing both volume and diversity.

Results and Evaluation

The Plan-and-Act framework was evaluated on the WebArena-Lite benchmark, achieving a marked improvement in task success rates relative to existing methods such as WebRL, with efficacy demonstrated through a success rate of 53.94%. The results underscore the efficacy of synthetic data strategies and dynamic planning in enhancing agent performance for complex, long-range tasks.

Conclusion

Plan-and-Act effectively separates planning from execution, enhancing performance in LLM agents by employing synthetic data for training nuanced strategies for dynamic, long-horizon tasks. Its modular architecture highlights the potential of scalable data generation in overcoming the intrinsic limitations faced by LLMs in detailed plan generation. Future work aims to integrate memory-enhanced reasoning and multi-modal inputs to further bolster AI capabilities in diverse digital environments.