Synergizing Reinforcement Learning and Genetic Algorithms for Neural Combinatorial Optimization

Published 11 Jun 2025 in cs.LG and cs.NE | (2506.09404v1)

Abstract: Combinatorial optimization problems are notoriously challenging due to their discrete structure and exponentially large solution space. Recent advances in deep reinforcement learning (DRL) have enabled the learning heuristics directly from data. However, DRL methods often suffer from limited exploration and susceptibility to local optima. On the other hand, evolutionary algorithms such as Genetic Algorithms (GAs) exhibit strong global exploration capabilities but are typically sample inefficient and computationally intensive. In this work, we propose the Evolutionary Augmentation Mechanism (EAM), a general and plug-and-play framework that synergizes the learning efficiency of DRL with the global search power of GAs. EAM operates by generating solutions from a learned policy and refining them through domain-specific genetic operations such as crossover and mutation. These evolved solutions are then selectively reinjected into the policy training loop, thereby enhancing exploration and accelerating convergence. We further provide a theoretical analysis that establishes an upper bound on the KL divergence between the evolved solution distribution and the policy distribution, ensuring stable and effective policy updates. EAM is model-agnostic and can be seamlessly integrated with state-of-the-art DRL solvers such as the Attention Model, POMO, and SymNCO. Extensive results on benchmark problems (e.g., TSP, CVRP, PCTSP, and OP) demonstrate that EAM significantly improves both solution quality and training efficiency over competitive baselines.

Abstract PDF Upgrade to Chat

Summary

The paper introduces EAM, merging DRL with GA to enhance exploration and solution quality in Neural Combinatorial Optimization.
It employs genetic operations like crossover and mutation alongside an autoregressive policy to overcome local optima and improve search efficiency.
Experiments on TSP, CVRP, and related benchmarks demonstrate accelerated convergence and robust solution improvements.

Synergizing Reinforcement Learning and Genetic Algorithms for Neural Combinatorial Optimization

Introduction

This paper proposes the Evolutionary Augmentation Mechanism (EAM), merging Deep Reinforcement Learning (DRL) with Genetic Algorithms (GAs) to address challenges in Neural Combinatorial Optimization (NCO). It tackles the limitations of DRL, such as limited exploration and susceptibility to local optima, by integrating the global search capabilities of GAs. EAM is positioned as a model-agnostic module that can be integrated with state-of-the-art DRL solvers, promising enhanced exploration and accelerated convergence. The approach is tested against benchmark combinatorial optimization problems like TSP, CVRP, PCTSP, and OP, showing significant improvements.

Evolutionary Augmentation Mechanism

Overview

EAM integrates the learning efficiency of DRL with the exploratory capabilities of GAs. As illustrated in the framework (Figure 1), the mechanism involves generating initial solutions from the DRL policy, refining them through genetic operations like crossover and mutation, and reinjecting evolved solutions into the policy training loop.

Figure 1: An overview of the proposed Evolutionary Augmentation Mechanism (EAM). Policy-sampled solutions are used to initialize the population of the Genetic Algorithm. The evolved solutions are then merged with the original samples and jointly used to train the policy network, forming a closed-loop learning and evolutionary framework.

Key Components and Process

Solution Generation: The policy network, based on an autoregressive model, generates initial solutions. This initial population is fundamental for further GA-based refinement.
Genetic Operations:
- Selection: An elitist strategy ensures that only the top-performing individuals undergo genetic operations, safeguarding computational efficiency while focusing on high-quality solutions.
- Crossover and Mutation: Genetic diversity is injected primarily via Order Crossover and task-specific mutation operations, which adapt to the structural characteristics of the problem space.
Closed-loop Integration: EAM creates a feedback loop where the GA evolves solutions that improve the policy's exploration capability, and the policy accelerates GA by providing well-structured initial solutions.

Theoretical Considerations

The integration demands analyzing potential biases introduced by GA perturbations into the DRL policy. The paper introduces a theoretical framework based on KL divergence to quantify this effect, ensuring that the integration does not compromise on policy stability. Through detailed mathematical modeling, upper bounds are established, ensuring that even evolved solutions remain in close proximity to the DRL policy distribution, thereby preserving the integrity of training updates.

Experimental Evaluation

Extensive experiments across multiple COPs demonstrate EAM's proficiency. The method consistently enhances solution quality and accelerates convergence across all tested scenarios, including variations of TSP and CVRP across different scales.

Figure 2: Training curves of AM, POMO, and Sym-NCO with and without EAM on TSP-100 and CVRP-100. EAM consistently accelerates convergence across different backbones and problem settings, highlighting its ability to improve training efficiency in addition to final solution quality.

The results indicate that EAM not only optimizes performance metrics like solution quality and computation time but also ensures robustness across different network architectures.

Implications and Future Directions

EAM's ability to seamlessly augment DRL solvers, enhancing both exploration and training efficiency, positions it as a versatile tool for solving NP-hard combinatorial problems. Future research could extend EAM to various other COPs outside the routing domain, each with unique structural constraints. Further exploration into dynamic evolutionary hyperparameter adjustments based on task-specific feedback could also enhance the framework's adaptability and performance.

Conclusion

The Evolutionary Augmentation Mechanism presents an innovative pathway for addressing the intrinsic limitations of DRL in solving NCO problems. By leveraging the complementary strengths of DRL and GAs, EAM achieves marked improvements in both solution quality and training efficiency, underscoring its potential as a transformative tool in combinatorial optimization research and applications.

Markdown Report Issue