Solver Training: Techniques & Applications

Updated 6 February 2026

Solver training is a set of methods that adjust solver parameters using learning-based techniques to effectively solve optimization, PDE, and combinatorial problems.
It employs diverse methodologies like backpropagation-based learning, deep unfolding, and adversarial training, integrating conventional algorithms with data-driven objectives.
Practical applications demonstrate performance gains through parallelism, enhanced solution fidelity, and efficient on-chip training even within hardware constraints.

Solver training refers to all methodologies in which the parameters of a solver—an algorithm designed to compute solutions for mathematical optimization problems, dynamical systems, partial differential equations, or discrete combinatorial constructions—are adjusted through learning-based or data-driven techniques rather than (or in addition to) conventional algorithm engineering. Modern solver training spans photonic, quantum, neural, and symbolic domains, and includes supervised, unsupervised, reinforcement, and hybrid approaches. The following sections provide a systematic overview of solver training methods, their algorithmic core, representative applications, performance characteristics, and current research frontiers, drawing upon diverse experimental and theoretical advancements.

1. Solver Training Algorithms Across Modalities

Solver training strategies are highly problem-dependent and modality-specific, but common paradigms include:

Backpropagation-based Learning: Neural solvers for PDEs (e.g., physics-informed neural networks, PINNs) are commonly trained by minimizing a physics-consistent loss via gradient-based optimization, possibly leveraging specialized hardware for acceleration or physical realization (Zhao et al., 1 Jan 2025).
Zeroth-order or Backpropagation-free Training: In hardware-limited regimes or non-differentiable architectures (optical, analog), derivatives are approximated stochastically (finite-difference) using in situ loss perturbations, as in the on-chip training of optical neural PINNs (Zhao et al., 1 Jan 2025).
Deep Unfolding: Classical combinatorial or convex optimization solvers (e.g., the Ohzeki method for COPs) can be temporally unrolled and reparameterized as deep networks. The step sizes, penalty weights, or update rules become trainable parameters, optimized via gradient descent on data-driven objectives (Hagiwara et al., 7 Jan 2025).
Adversarial and Distillation-based Solver Learning: In generative modeling, solver coefficients and discretization schedules for integrating ODEs (e.g., diffusion, flow-matching) are learned by minimizing global discrepancy with high-fidelity teacher solvers, with further adversarial regularization to enhance sample quality (Oganov et al., 20 Oct 2025, Frankel et al., 24 Feb 2025, Chen et al., 29 Jan 2026).
Solver-free Losses for Decision-focused Learning: For tractable optimization problems (linear, ILP, TSP), solver training can bypass inner optimization loops by exploiting geometric or combinatorial structure—e.g., through comparison of predicted solutions with precomputed neighbors or unsupervised ILP relaxations (Berden et al., 28 May 2025, Nandwani et al., 2022, Gaile et al., 2022).
Hybrid Optimization: Alternating between classical optimization (e.g., gradient descent) and exact combinatorial search within local neighborhoods (e.g., last-layer MILP over DNN weights) can yield superior accuracy and data/resource efficiency (Ashok et al., 2022).

2. Loss Functions and Optimization Targets

Observational data, recurrence relations, or governing equations define the training objectives for solver learning. Prominent categories include:

Physics-informed losses: For PDE solvers, the loss penalizes deviations from the governing differential operator, initial/boundary conditions, and (optionally) data fit. For example, PINNs for the 1D heat equation are trained with

$L(\theta) = L_r(\theta) + L_0(\theta) + L_b(\theta)$

where $L_r$ encodes residuals of the PDE operator, $L_0$ enforces initial conditions, and $L_b$ enforces boundaries (Zhao et al., 1 Jan 2025).

Penalty or relaxation-based unsupervised losses: In unsupervised TSP solver training, the network output is penalized by soft constraints reflecting LP relaxations of the TSP ILP: tour length, degree constraints, and subtour elimination cut constraints with adjustable coefficients (Gaile et al., 2022).
Margin-based surrogate losses: Neural ILP architectures can be trained using soft-margin losses designed to separate feasible from infeasible configurations, exploiting hyperplane representations of the constraint polytope and negative sample mining (Nandwani et al., 2022).
Regret or optimality-gap criteria: In "predict-then-optimize" settings for LPs, regret under the true cost is minimized. Solver-free training uses a convex surrogate (the "lava" loss), defined via the optimal solution's objective difference with its set of adjacent polytope vertices under the predicted cost (Berden et al., 28 May 2025).
Data-free distillation to teacher trajectories: Generative ODE solvers (for diffusion/flow matching models) are trained to approximate the endpoint of a high-NFE teacher by minimizing global perceptual distance (e.g., LPIPS) between trajectories (Frankel et al., 24 Feb 2025, Oganov et al., 20 Oct 2025, Chen et al., 29 Jan 2026).
Energy-based and contrastive-divergence objectives: Conditional energy solvers can be trained by maximizing the difference between the model score on real data and refined (MCMC-sampled) configurations (Xie et al., 2019).

3. Hardware and Physical Realization of Solver Training

Several recent systems perform solver training directly in unconventional hardware modalities:

Photonic/Optical Neural Solvers: Optical architectures implement matrix-vector multiplications via wavelength-division multiplexing, microring resonator banks, and photodetectors; nonlinearities are integrated in post-photodetection electrical domains. Training is performed on-chip via backpropagation-free stochastic finite differences, inherently compensating for device nonidealities and requiring no calibration (Zhao et al., 1 Jan 2025).
Quantum Annealing as a Trainable Sampler: Step-size and update parameters of deep-unfolded combinatorial solvers are trained classically, then transferred unmodified to hybrid systems utilizing D-Wave quantum annealing for sampler calls, providing significant speedups and improved convergence over purely classical samplers (Hagiwara et al., 7 Jan 2025).

4. Efficiency, Scalability, and Empirical Performance

Solver training delivers performance gains in both computational efficiency and solution fidelity:

Speedups through parallelism: Multigrid-in-time solvers for recurrent networks (MGRIT-GRU) achieve up to $6.5\times$ acceleration on long sequences by hierarchically decomposing sequential dependencies and parallelizing both forward and backward passes (Moon et al., 2022).
Data-free or solver-free training efficiency: Unsupervised or solver-free losses eliminate the need for expensive ground-truth optimization in each training iteration, scaling polynomially with problem size (e.g., ILP polyhedron separation or LP adjacents) and enabling training on problems where solver-based approaches are prohibitive (Nandwani et al., 2022, Berden et al., 28 May 2025, Sojitra et al., 17 Nov 2025).
Empirical results in generative modeling: State-of-the-art FID scores in image synthesis have been achieved for low-NFE solvers by learning the coefficients and schedules of high-order ODE solvers, often by adversarial or alternating optimization schemes (Oganov et al., 20 Oct 2025, Frankel et al., 24 Feb 2025, Chen et al., 29 Jan 2026).
Hybrid regime advantages: Alternating global GD with local MILP tunneling results in $\approx 30\%$ lower MSE and halved training times relative to pure-GD for regression/classification tasks (Ashok et al., 2022).

5. Limitations, Theoretical Guarantees, and Open Challenges

Despite their practical impact, current solver training methodologies exhibit several limitations:

Hardware and analog limits: Optical PINN networks are currently restricted by analog precision (8–10 bits), and on-chip nonlinearities are not fully integrated. Scaling to large network sizes requires tensorization (Zhao et al., 1 Jan 2025).
Generalization and extrapolation: MGCNN and other neural solvers may degrade when tested outside the coefficient or solution space seen in training. Mixture-model or hierarchical coefficient sampling can mitigate these effects, but extrapolation to shock-forming or discontinuous dynamics remains open (Sojitra et al., 17 Nov 2025, Xie et al., 2023).
Sample/instance specificity: Learned solver coefficients often depend on NFE budget and problem size; per-instance optimization is rarely feasible at scale (Frankel et al., 24 Feb 2025).
No global optimality guarantees: Hybrid optimization (GD+solver) offers local tunneling only; global leaps are not proven, though box constraints mitigate overfitting (Ashok et al., 2022).
Modeling accuracy for adaptive solvers: Without tailored step-size logic in training (e.g., in neural ODEs), black-box adaptive solvers are ineffective. Co-training of the solver and dynamics is required (Allauzen et al., 2022).

6. Representative Experimental Results

Domain	Method	Key Metric/Result	Reference
PDE Optical Solver	On-chip PINN, ZO training	Achieves $\ell_2$ test-error $\approx 5\times10^{-3}$ in $10^3$ iters, $p$ J/MAC	(Zhao et al., 1 Jan 2025)
Neural TSP	GNN, unsupervised LP loss	Euclidean n=20: $1.29\%$ optimum gap, $2.85$ s per 1280 inferences	(Gaile et al., 2022)
Neural ILP	Soft-margin separation, solver-free	9x9 sudoku: $100\%$ test accuracy (symbolic), $98.3\%$ (visual) in $<2$ h	(Nandwani et al., 2022)
Decision-focused LP	Lava loss (solver-free)	Training time reduced $20\!\times$ , normalized regret $\sim0.016$	(Berden et al., 28 May 2025)
Generative ODE	S4S, S4S-Alt, BA, GAS	FID $<$ 4 at 5–10 NFE, matches teacher at $0.5$– $0.75\times$ the steps	(Frankel et al., 24 Feb 2025, Oganov et al., 20 Oct 2025, Chen et al., 29 Jan 2026)
Multi-task COP	Bandit-matrix-driven task selection	$30$– $50\%$ reduction in optimality gap under constrained budgets	(Wang et al., 2023)

7. Future Directions and Open Questions

Research continues to leverage solver training for increasingly general, scalable, and robust solvers:

Modular and curriculum-based solver design: "POWERPLAY" proposes continual solver training via invention of the simplest unsolvable problem at each iteration, guaranteeing expanding generalization without catastrophic forgetting (Schmidhuber, 2011).
Solver-free operator learning: Manufacturing analytic data for entire function spaces (e.g., FNO+MML synthesis) presents a scalable, physically consistent alternative to numerical simulation, with open challenges in accommodating irregular boundary conditions and non-smooth phenomena (Sojitra et al., 17 Nov 2025).
Hybrid quantum-classical, photonic, and distributed paradigms: Transfer learning between tractable classical simulators (e.g., SQA) and quantum hardware, and physical photonic acceleration, are at the frontier of solver-enabled optimization (Hagiwara et al., 7 Jan 2025, Zhao et al., 1 Jan 2025).
Theoretical models of training dynamics: Differential-equation based frameworks permit rigorous quantification and prediction of self-improving solver capabilities, as in the solver-verifier gap model for LLMs (Sun et al., 29 Jun 2025).

Solver training is an evolving area, tightly coupling advances in machine learning, optimization, hardware, and computational mathematics, with applications spanning scientific computing, operations research, and modern generative modeling.