Algorithm Discovery With LLMs: Evolutionary Search Meets Reinforcement Learning

Published 7 Apr 2025 in cs.AI, cs.LG, and cs.NE | (2504.05108v3)

Abstract: Discovering efficient algorithms for solving complex problems has been an outstanding challenge in mathematics and computer science, requiring substantial human expertise over the years. Recent advancements in evolutionary search with LLMs have shown promise in accelerating the discovery of algorithms across various domains, particularly in mathematics and optimization. However, existing approaches treat the LLM as a static generator, missing the opportunity to update the model with the signal obtained from evolutionary exploration. In this work, we propose to augment LLM-based evolutionary search by continuously refining the search operator - the LLM - through reinforcement learning (RL) fine-tuning. Our method leverages evolutionary search as an exploration strategy to discover improved algorithms, while RL optimizes the LLM policy based on these discoveries. Our experiments on three combinatorial optimization tasks - bin packing, traveling salesman, and the flatpack problem - show that combining RL and evolutionary search improves discovery efficiency of improved algorithms, showcasing the potential of RL-enhanced evolutionary strategies to assist computer scientists and mathematicians for more efficient algorithm design.

Abstract PDF Upgrade to Chat

Summary

Insights into "Algorithm Discovery With LLMs: Evolutionary Search Meets Reinforcement Learning"

The paper under review proposes a method called EvoTune, which combines evolutionary search with reinforcement learning (RL) to optimize the discovery of algorithms using large language models (LLMs). The authors aim to improve the efficiency of exploring algorithmic spaces by addressing limitations present in prior approaches, where LLMs were treated as static generators. By integrating RL fine-tuning, EvoTune dynamically updates the LLMs based on feedback obtained from evolutionary exploration, thus utilizing the LLMs not only as static generation tools but as evolving entities capable of improving output with each iteration.

Methodology

EvoTune consists primarily of two phases: evolutionary search and RL training. In the evolutionary search phase, the method explores the space of possible programs, maintaining a program database organized into islands, which evolve independently. This is followed by an RL training phase where the LLM's policy is fine-tuned using feedback from the evolutionary search. The authors employ the Direct Preference Optimization (DPO) algorithm for fine-tuning, leveraging collected preference data to optimize the LLM.

To aid the search, EvoTune uses a system of evolutionary search strategies including selection, variation, and diversity maintenance, inspired by natural genetic principles. The approach also introduces a continuous update mechanism, where the LLM's policy is periodically refined based on insights gained from the best-performing solutions discovered through evolutionary search.

Numerical Results and Observations

Experiments were conducted on three combinatorial optimization tasks: bin packing, traveling salesman, and the flatpack problem, using several mainstream LLMs such as Llama3.2 1B Instruct, Phi 3.5 Mini Instruct, and Granite 3.1 2B Instruct. The results showed that EvoTune consistently improved the discovery efficiency of better solutions compared to baseline methods that do not employ RL. Specifically, EvoTune outperformed baseline methods in terms of both average reward scores and the diversity of unique solutions generated across these varied benchmarks. This signifies the potential of integrating RL into the combinatorial optimization processes through algorithmic discovery.

Implications and Future Directions

The proposed method suggests a promising path for enhancing the capabilities of LLMs in terms of algorithmic exploration and discovery. Practically, EvoTune can facilitate the development of more efficient algorithms across various domains, which could significantly accelerate advancements in fields that require sophisticated mathematical computations or optimization strategies. Theoretically, this approach aligns with the "Bitter Lesson" in AI that emphasizes the importance of leveraging computation, often to a much greater extent than learning alone, for building robust systems.

Future work could explore scaling the approach with larger models and more exhaustive sampling budgets, as the authors note that most experiments were constrained by computational resources. Furthermore, exploring the application of EvoTune across other complex optimization and machine learning tasks, such as those involving continuous decision processes or dynamic environments, could yield valuable insights and push the boundaries of current LLM applicability. Another intriguing avenue could be investigating hybrid models which combine additional learning paradigms, like transfer learning, to assimilate knowledge from related tasks and potentially further enhance performance and generalization capabilities.

Overall, the paper contributes significantly to the ongoing exploration of augmenting traditional AI and ML strategies with biologically inspired optimization techniques, creating a more nuanced understanding of how algorithmic efficiencies can be unearthed in increasingly complex problem spaces.