Papers
Topics
Authors
Recent
Search
2000 character limit reached

Large Language Models Can Solve Real-World Planning Rigorously with Formal Verification Tools

Published 18 Apr 2024 in cs.AI, cs.CL, and cs.HC | (2404.11891v3)

Abstract: LLMs struggle to directly generate correct plans for complex multi-constraint planning problems, even with self-verification and self-critique. For example, a U.S. domestic travel planning benchmark TravelPlanner was proposed in Xie et al. (2024), where the best LLM OpenAI o1-preview can only find viable travel plans with a 10% success rate given all needed information. In this work, we tackle this by proposing an LLM-based planning framework that formalizes and solves complex multi-constraint planning problems as constrained satisfiability problems, which are further consumed by sound and complete satisfiability solvers. We start with TravelPlanner as the primary use case and show that our framework achieves a success rate of 93.9% and is effective with diverse paraphrased prompts. More importantly, our framework has strong zero-shot generalizability, successfully handling unseen constraints in our newly created unseen international travel dataset and generalizing well to new fundamentally different domains. Moreover, when user input queries are infeasible, our framework can identify the unsatisfiable core, provide failure reasons, and offers personalized modification suggestions. We show that our framework can modify and solve for an average of 81.6% and 91.7% unsatisfiable queries from two datasets and prove with ablations that all key components of our framework are effective and necessary. Project page: https://sites.google.com/view/LLM-rwplanning.

Citations (4)

Summary

  • The paper demonstrates a hybrid framework that boosts travel planning success from 0.6% to 97% on U.S. datasets using SMT solvers.
  • The methodology transforms travel planning into a constraint satisfaction problem for formal verification of multi-constraint itineraries.
  • The framework includes an interactive plan repair feature that collaborates with users to adapt to both domestic and international travel constraints.

Formal Verification in Travel Planning with LLMs

LLMs have recently emerged as powerful tools capable of handling a variety of tasks due to their extensive world knowledge and reasoning abilities. Despite their impressive capabilities, LLMs have limitations in directly solving complex combinatorial optimization problems, such as travel planning, where multiple constraints must be satisfied. The paper "LLMs Can Plan Your Travels Rigorously with Formal Verification Tools" presents a novel framework integrating LLMs with formal verification tools to solve such intricate problems, specifically focusing on travel planning.

The authors propose a framework that leverages satisfiability modulo theories (SMT) solvers to address the shortcomings of LLMs in handling multi-constraint optimization. The framework transforms the travel planning challenge into a constraint satisfaction problem, enabling rigorous formulation and solution through SMT. By doing this, the framework ensures that all constraints are formally verified, guaranteeing a valid solution if one exists within the specified criteria.

The evaluation framework uses TravelPlanner, a benchmark specifically designed for U.S. domestic travel planning, revealing that LLMs alone achieve a success rate of only 0.6%. In contrast, the proposed framework reached a significantly higher success rate of 97% on TravelPlanner's validation and test sets. This indicates the effectiveness of combining LLMs with formal verification tools for computationally intensive planning tasks.

Furthermore, the authors expand the evaluation to include a separate dataset for international travel, achieving a success rate of 85% for TravelPlanner and 78.6% for their dataset. The variation in success rates illustrates the framework's adaptability to different datasets and constraints, underscoring its robustness.

A key component of the framework is its interactive plan repair capability. When confronted with unsatisfiable travel plans, the LLM component collaborates with the user by providing suggestions to modify constraints. This feature exemplifies the utility of LLMs in interacting with humans and adapting plans according to diverse preferences and dynamically changing requirements.

The research presents several implications for AI development. Practically, this framework can assist in efficiently planning complex travel itineraries, facilitating both individual and commercial applications. Theoretically, it offers a pathway to enhance LLM capabilities by integrating them with formal methods, potentially expanding their utility in other domains requiring strict constraint satisfaction.

Looking to the future, the integration of LLMs with formal solvers could see broader applications beyond travel planning. Fields such as logistics, supply chain management, and automated scheduling may benefit from such a hybrid approach, offering solutions that balance flexibility with formal correctness. Further research may explore extending this framework to encompass machine learning techniques within the reasoning process itself, enhancing the adaptive capabilities of LLMs in real-world applications.

In summary, this paper provides noteworthy insights into overcoming the inherent limitations of LLMs in complex planning scenarios through the use of formal verification tools, paving the way for future advancements in AI-driven planning and optimization tasks.

Paper to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 2 tweets with 87 likes about this paper.