- The paper introduces a GPU-accelerated fusion framework that enhances MIP primal heuristics by integrating bound propagation and extended feasibility pump.
- It utilizes innovative techniques like probing cache and bulk rounding, achieving lower primal gaps and higher feasible solution counts on MIPLIB2017 benchmarks.
- The approach demonstrates scalable parallelism on CUDA-enabled GPUs, offering significant improvements over traditional CPU-based heuristics.
GPU-Accelerated Primal Heuristics for Mixed Integer Programming
Problem Context and Motivation
Mixed Integer Programming (MIP), particularly Mixed Integer Linear Programming (MILP), is a foundational paradigm in combinatorial optimization, rife with NP-hardness and ubiquitous in industrial applications. State-of-the-art solution methods such as Branch-and-Cut rely on both rigorous enumeration and the generation of cuts, but in practice, obtaining high-quality feasible solutions within stringent time budgets hinges on effective primal heuristics. The paper "GPU-Accelerated Primal Heuristics for Mixed Integer Programming" (2510.20499) introduces a fusion framework that leverages CUDA-enabled GPUs to accelerate and enhance leading MIP primal heuristics, thereby substantially expanding the feasible solution space and improving objective values on benchmark datasets.
GPU-Accelerated Heuristic Design
Bound Propagation
The authors extensively parallelize bound propagation (BP), a critical preprocessing step for tightening variable bounds based on constraint activities. Constraints and variables are binned via Logarithmic Radix Binning to address load balancing given the varying density of nonzero matrix elements. Sparse matrix operations are managed in Compressed Sparse Row (CSR) form, with activities and bounds updated by several GPU strategies—sub-warp computation, block approaches for medium-sized constraints, and heavy element strategies for large constraints. Data locality optimizations, such as grouping lower and upper bounds adjacently, are exploited to maximize memory bandwidth usage. All kernel launches are managed via CUDA graphs to minimize latency and concurrency overhead, facilitating nearly full device utilization even when propagating implications for single variables.
Probing Cache and Bulk Rounding
To enable rapid feasibility checks and early infeasibility detection, the paper introduces a probing cache. This cache precomputes, per variable, implication sets for binary and boxed integer variables, partitioning intervals and storing bound consequences. Probing cache enables fast bulk rounding heuristics, allowing multiple integer variables to be rounded concurrently (according to conflict avoidance criteria established from cache content) and propagating their fixes en masse. Variable prioritization for probing is lexicographically determined by violated constraints, maximum violation, and minimum unit slack consumption per constraint.
Bulk rounding, accelerated by probing cache, is executed in parallel, with backtracking and repair procedures as fallback mechanisms in case of infeasibility. Repair procedures, implemented as parallel neighborhood searches, shift variable bounds dynamically and are iteratively applied until propagation yields feasible bounds or the time budget is exhausted.
Local Search and Extended Feasibility Pump
Building on recent advances in primal heuristics—specifically Feasibility Pump (FP), Feasibility Jump (FJ), and Fix-and-Propagate—the paper fuses these methods and further accelerates them through GPU parallelism. The FP heuristic performs alternating LP projections and integer roundings; cycle-breaking methods utilize Local-MIP, a lagrangian-based local search operator with aggressive objective-improvement (lift and breakthrough moves) and hierarchical scoring. Each candidate move's score is fully recomputed in parallel on the GPU, using precomputed variable neighborhoods and prefix-sum-based load balancing.
FP is extended to utilize GPU-accelerated PDLP as the underlying LP solver for projections, with warm starts using cached primal and dual solutions. The rounding order prioritization and bulk rounding leverage implied slack sorting and activity-based impact analysis per variable, guaranteeing more effective convergence towards feasibility. The fused framework runs iterative rounds of FP projection, bulk rounding, local search, and cycle detection/cycle escape (via Local-MIP), switching to improvement heuristics with objective cuts when feasible points are found.
Numerical Results
Using the presolved MIPLIB2017 benchmark (240 instances), the framework demonstrated significant improvements over prior work:
- GPU Local-MIP outperformed CPU-based Local-MIP in both count of feasible solutions and average primal gap.
- GPU Extended FP (with nearest rounding) produced 220 feasible solutions and achieved a 23% average primal gap.
- GPU Extended FP combined with Fix-and-Propagate achieved 220.67 feasible solutions and a 22% primal gap—outperforming previous state-of-the-art (Fix-and-Propagate default: 193.8 feasible, 66% gap; Local-MIP CPU: 188.67 feasible, 46% gap).
These results were obtained under standardized presolve with the open-source Highs solver and rigorous time limits (10 minutes per instance), using a H100 GPU. Control experiments were run with multiple seeds and averaged for reproducibility.
Implications and Future Directions
The fusion of multiple advanced primal heuristics—each deeply parallelized and optimized for GPU execution—yields strong empirical improvements in MILP solution quality. From a practical standpoint, this approach enables rapid, scalable exploration of large MILP instances previously constrained by CPU-bound execution and memory bandwidth. The use of probing cache, bulk rounding, and hierarchical local search moves can be generalized to other combinatorial optimization settings, especially those amenable to sparse matrix parallelism and constraint activity-based methods.
Theoretically, the observed improvements underscore the viability of approximate LP solvers (PDLP) in primal heuristics and the effectiveness of combining projection-based and neighborhood-based search strategies. The transition to GPU computation opens avenues for further research: adaptive hybrid frameworks merging primal and dual heuristics, dynamic load balancing for massive-scale sparse matrices, and integration with emerging "matrix-free" projection strategies. The availability of the open-source implementation in Nvidia cuOpt facilitates future benchmarking and method composition.
Conclusion
This paper presents a comprehensive GPU-driven fusion framework for primal heuristics in MILP, integrating accelerated BP, probing cache, bulk rounding, advanced local search, and extended feasibility pump. The empirical results on MIPLIB2017 indicate substantial advances in both feasible solution rate and optimality gap compared to prior CPU-based heuristics. The work directly demonstrates the practical and algorithmic benefits of leveraging parallel architectures in combinatorial optimization and provides an extensible foundation for future algorithmic developments in large-scale MIP.