Thought of Search: Planning with Language Models Through The Lens of Efficiency

This presentation examines a fundamental shift in how we use large language models for planning tasks. Instead of repeatedly querying language models during search, the authors demonstrate that generating symbolic search components upfront—like successor functions and goal tests—enables classic algorithms to solve planning problems with dramatically fewer resources while maintaining accuracy. Through experiments on the 24 game problem, they show this approach achieves superior performance compared to existing methods, opening a path toward economically viable language model planning.
Script
Every time a language model plans a move, it burns through computational resources like a data center on overdrive. The researchers behind this work asked a provocative question: what if we could get language models to do the heavy lifting once, then step aside and let classic algorithms take over?
Existing methods like Tree of Thoughts and Graph of Thoughts treat language models as oracles, querying them at every decision point. This creates a computational bottleneck that makes real-world deployment economically unfeasible.
The authors propose a radical inversion of this paradigm.
Instead of asking the language model to evaluate every potential move, they prompt it once to produce the rules of the game: a successor state function that generates next moves and a goal test that recognizes success. Then Breadth-First Search takes over, navigating the problem space without touching the language model again.
Testing on 1362 instances of the 24 game problem, the method achieved a dramatic drop in resource usage compared to baselines. Where other methods burned through thousands of language model queries, this approach generated the search logic once and solved problems through classical search, proving that efficiency and correctness need not be enemies.
This work challenges the assumption that language models must be in the loop at every step. By generating the machinery of search rather than executing it, we gain soundness, completeness, and economic viability. The results are preliminary but point toward a future where planning with language models is not just powerful, but practical.
The next time you see a language model struggling through a search tree one expensive query at a time, remember: sometimes the smartest move is to ask once, then let the algorithm do what it does best. Visit EmergentMind.com to explore this paper further and create your own research videos.