System-1.x: Learning to Balance Fast and Slow Planning with Language Models

Published 19 Jul 2024 in cs.AI, cs.CL, and cs.LG | (2407.14414v2)

Abstract: LLMs can be used to solve long-horizon planning problems in two distinct modes: a fast 'System-1' mode, directly generating plans without any explicit search or backtracking, and a slow 'System-2' mode, planning step-by-step by explicitly searching over possible actions. While System-2 is typically more effective, it is also more computationally expensive, making it infeasible for long plans or large action spaces. Moreover, isolated System-1 or 2 ignores the user's end goals, failing to provide ways to control the model's behavior. To this end, we propose the System-1.x Planner, a controllable planning framework with LLMs that is capable of generating hybrid plans and balancing between the two planning modes based on the difficulty of the problem at hand. System-1.x consists of (i) a controller, (ii) a System-1 Planner, and (iii) a System-2 Planner. Based on a user-specified hybridization factor (x) governing the mixture between System-1 and 2, the controller decomposes a problem into sub-goals, and classifies them as easy or hard to be solved by either System-1 or 2, respectively. We fine-tune all three components on top of a single base LLM, requiring only search traces as supervision. Experiments with two diverse planning tasks -- Maze Navigation and Blocksworld -- show that our System-1.x Planner outperforms a System-1 Planner, a System-2 Planner trained to approximate A* search, and also a symbolic planner (A*). We demonstrate the following key properties of our planner: (1) controllability: increasing the hybridization factor (e.g., System-1.75 vs 1.5) performs more search, improving performance, (2) flexibility: by building a neuro-symbolic variant with a neural System-1 and a symbolic System-2, we can use existing symbolic methods, and (3) generalizability: by being able to learn from different search algorithms, our method is robust to the choice of search algorithm.

Abstract PDF HTML Upgrade to Chat

Citations (6)

View on Semantic Scholar

Summary

The paper introduces System-1.x, a hybrid planning framework that dynamically balances fast heuristic planning with accurate search-based methods using a tunable hybridization factor.
It utilizes a controller to decompose tasks into sub-goals handled by either quick System-1 or detailed System-2 planners, trained using data from classical planning challenges.
Experimental results show System-1.x surpasses pure heuristic and search approaches, achieving up to 70.4% accuracy in Maze Navigation with fewer states explored.

System-1.x: Learning to Balance Fast and Slow Planning with LLMs

The paper entitled "System-1.x: Learning to Balance Fast and Slow Planning with LLMs" introduces a hybrid planning framework that leverages the strengths of both rapid heuristic-based decision-making and meticulous, step-by-step planning processes. This work situates itself within the broader discourse on the limitations and potentials of LLMs in long-horizon planning tasks, offering a sophisticated method that balances speed and accuracy based on problem complexity.

Overview

Traditional LLM planning can be categorized into two distinct modes: System-1 and System-2. System-1 approaches produce plans quickly by heuristics or learned models but often lack robustness and accuracy, especially for complex tasks. Conversely, System-2 approaches incorporate thorough search mechanisms, resulting in higher accuracy but at the expense of computational resources.

The System-1.x Planner, proposed in this paper, strikes a balance between these two paradigms. It consists of three primary components:

Controller: Decomposes a planning problem into sub-goals and classifies them as either "easy" or "hard."
System-1 Planner: Addresses the easier sub-goals using fast, heuristic-based planning.
System-2 Planner: Deals with harder sub-goals through more deliberate, search-based methods.

Methodology

The innovative aspect of System-1.x lies in its controllability, governed by a user-defined hybridization factor $x$ . This factor determines the proportion of fast versus thorough planning used, allowing the controller to dynamically allocate resources based on the perceived difficulty of sub-goals.

Training

To facilitate the training of System-1.x, search traces from various classical planning problems are employed:

System-1 Data: Simple plans produced heuristically.
System-2 Data: Search trajectories generated by algorithms such as A $^*$ .
Controller Data: Derived by decomposing plans into sub-goals using a sliding window technique and classifying them according to their difficulty, defined by a heuristic function.

Evaluation

The paper evaluates System-1.x through experiments on two diverse planning tasks:

Maze Navigation: Involves navigating a 5x5 maze with obstacles.
Blocksworld: Requires reconfiguring blocks on a table to a predefined goal state.

Key Results and Analysis

Performance and Efficiency

System-1.x demonstrates superior performance across a range of budgets compared to pure System-1 and System-2 approaches. For instance, in the Maze Navigation task:

System-1.x achieves an accuracy of 70.4% at approximately 13.6 states explored, surpassing the System-1 and System-2 planners, which obtain 48.7% and 37.2% at comparable states explored.

In the Blocksworld task, which tests out-of-distribution generalization to longer plan lengths:

System-1.x maintains a higher accuracy at lower #States-Explored, showing significant improvements over System-2, especially when sub-goals simplify the planning process.

Controllability

A notable feature of System-1.x is its controllability, both at training and inference times:

By adjusting the hybridization factor $x$ , users can fine-tune the balance between speed and accuracy.
During inference, the controller can be biased towards more System-2 planning if higher accuracy is required, effectively transforming the System-1.x Planner towards a full System-2 Planner without retraining.

Neuro-Symbolic Integration

The potential for combining neural and symbolic methods is also explored:

Neuro-symbolic System-1.x, which uses A $^*$ as the System-2 component, outperforms pure symbolic planners like A $^*$ at matched #States-Explored. For example, at 11.6 states, System-1.x achieves 70.5% accuracy compared to A $^*$ 's 31.0%.

Implications and Future Directions

The implications of this research are significant:

Practically: System-1.x offers a robust, flexible planning approach suitable for diverse and complex tasks where resource constraints vary.
Theoretically: It underscores the potential for hybrid models that leverage the best of heuristic-based and search-based planning, aligning with concepts from dual-process theories in cognitive science.

Future developments could explore:

Scalability: Extending System-1.x to handle larger-scale and more dynamic planning environments.
Adaptation to Uncertainty: Enhancing the controller to better manage partially observable and non-deterministic environments.
Further Integration: Seamlessly blending neural and symbolic methods to enhance the adaptability and generality of the planner.

In summary, the System-1.x Planner marks a significant advancement in the application of LLMs to planning tasks, providing a compelling blend of efficiency and accuracy through its hybrid, controllable approach. This sets a promising precedent for the development of more sophisticated, adaptive planning systems in the future.