Surrogate for Regularized Inversion

Updated 20 January 2026

Surrogate for regularized inversion is a computational approach that replaces expensive forward models with efficient emulators like neural networks or polynomial expansions.
It integrates Bayesian inference, variational regularization, and convex optimization to deliver rapid uncertainty quantification and scalable solutions in high-dimensional problems.
Adaptive refinement techniques and rigorous error analysis ensure accurate regularization while significantly reducing computational costs.

A surrogate for regularized inversion is a computationally efficient representation—such as a statistical emulator, neural network, polynomial expansion, or low-rank operator—used to replace expensive forward models when solving inverse problems with regularization. This paradigm appears ubiquitously across Bayesian inference, variational regularization, convex optimization, and measure-transport approaches, enabling scalable uncertainty quantification and rapid solution of high-dimensional or complex inverse problems. Surrogates are intimately linked to the design of regularization functionals, error quantification, adaptive training schemes, and posterior sampling workflows.

1. Mathematical Foundations of Surrogate-Based Regularized Inversion

Regularized inversion seeks parameters $m$ or $\theta$ that explain observed data $d$ , subject to prior structure or penalty:

$J(\theta) = \tfrac12 \| d - f(\theta) \|^2_{\Sigma^{-1}} + \mathcal{R}(\theta)$

Bayesian formulations cast this as

$p(\theta|d) \propto p(d|\theta)\,p(\theta)$

The surrogate approach replaces $f$ by a computationally tractable emulator $\hat{f}$ , permitting efficient evaluation of $J(\theta)$ and $p(\theta|d)$ . Surrogate construction rigorously accounts for approximation bias and uncertainty in both the cost and likelihood, e.g. inflating $\Sigma$ to $\Sigma + \Sigma_{\text{surr}}$ (Zhang et al., 2018, Wringer et al., 19 Dec 2025). This substitution is exact in the limit $\|\hat{f} - f\| \to 0$ within the relevant regions (posterior support), and approximates the solution up to quantifiable error as shown by convergence statements (Li et al., 2013, Meles et al., 6 May 2025).

2. Surrogate Model Architectures and Construction

Surrogates take varied forms depending on problem structure:

Polynomial Chaos–Kriging (PCK): Orthogonal polynomials under the prior, capturing global trend, with Gaussian-process (Kriging) correction for local residuals. Fitted via sparse regression and kernel hyperparameter optimization (Wringer et al., 19 Dec 2025, Zhang et al., 2018).
Neural Networks: Deep surrogates (autoencoders, U-Nets, operator networks) can emulate complex PDEs or mappings $f$ . Training minimizes data-fidelity loss (MSE) over a set of high-fidelity evaluations; derivative-informed variants further regularize the emulator via Jacobian or Sobolev norms (Cao et al., 2024, Zhou et al., 15 Jan 2026, Deveney et al., 2019, Liu et al., 2022).
Gaussian Processes: Flexible for low-dimensional or smooth problems; enables uncertainty quantification and direct surrogate-based likelihood construction (Wang et al., 23 Jul 2025, Zhang et al., 2018).
Low-rank/RSVD Operators: Randomized singular value decomposition approximates the action of $A$ for linear regularized problems, yielding surrogates that enforce correct range structures (e.g. Tikhonov, basis pursuit) and permit efficient inversion (Ito et al., 2019).
Plug-and-Play Denoisers: Replace explicit proximal operators in regularization by deep denoisers; in consensus-equilibrium frameworks, multiple 2D denoisers are combined for 3D inversion (Luiken et al., 2024).

The general principle is to capture the essential variability and predictive capacity of the forward model $f$ within the region relevant to the data/prior. Adaptive and sequential training strategies focus computational effort on the high-probability region discovered iteratively by posterior sampling or design (Meles et al., 6 May 2025, Wang et al., 23 Jul 2025, Li et al., 2013).

Global surrogates often fail to be accurate on the posterior support. To address this, refinement strategies include:

Posterior-guided active learning: Surrogates are retrained as MCMC samples concentrate in new regions; new high-fidelity runs are added where surrogate error or RMSE exceeds thresholds (Zhang et al., 2018, Wringer et al., 19 Dec 2025). The main loop alternates MCMC on the current surrogate and expansion of the training set using acquisition functions (variance reduction, expected improvement) (Wang et al., 23 Jul 2025).
Progressive bandwidth/data fidelity expansion: In waveform inversion, sequential surrogate refinement over increasing data fidelity (e.g. frequency bands), with training sets augmented by posterior samples at each stage, yields higher accuracy and less bias than prior-only surrogates (Meles et al., 6 May 2025).
Adaptive polynomial basis/reweighting: Surrogates are constructed via weighted least-squares on reference measures progressively concentrated on the posterior, focusing polynomial accuracy within the relevant region (Li et al., 2013).

These schemes deliver efficient, potential order-of-magnitude speed-ups over naïve global surrogates and brute-force inversion (Wringer et al., 19 Dec 2025, Meles et al., 6 May 2025), while maintaining robust uncertainty quantification and correcting for surrogate errors.

4. Surrogate Integration in Regularized Inversion Algorithms

Surrogates are embedded in various numerical schemes:

Bayesian MCMC: The surrogate replaces $f$ in likelihood evaluations. Correction for approximation bias/uncertainty is achieved by enlarging error covariance (e.g. $\Sigma + \Sigma_{\text{surr}}$ ) (Zhang et al., 2018, Wringer et al., 19 Dec 2025), or by augmenting the likelihood score in score-based samplers (Feng et al., 16 Sep 2025).
Gradient-based solvers: Analytic surrogates (neural nets, polynomials) admit automatic differentiation, allowing efficient gradient and Hessian computations for optimization or Hamiltonian Monte Carlo (Deveney et al., 2019, Zhou et al., 15 Jan 2026, Liu et al., 2022).
Proximal and ADMM methods: Surrogates can act as surrogate regularizers (Plug-and-Play denoising) or surrogate forward operators; inversion proceeds via iterative updates with closed-form soft-thresholding for sparse priors (Wang et al., 2024, Luiken et al., 2024, Zhou et al., 15 Jan 2026).
Transport map variational inference: Surrogate models furnish fast, offline-computed reduced posterior representations (e.g. LazyDINO), where amortization allows rapid reuse for new data (Cao et al., 2024).

Algorithmic pseudocode and implementation details are provided for sampling, optimization, and consensus-equilibrium iterations, with rigorous analysis of computational cost and convergence factors.

5. Error Analysis, Regularization, and Bayesian Uncertainty Quantification

Surrogate error directly impacts inversion reliability. Methods for analysis and control include:

Posterior error inflation: Likelihood variances are augmented by surrogate RMSE to avoid overconfidence (Zhang et al., 2018, Wringer et al., 19 Dec 2025).
Bias correction: Secondary surrogates model and correct mean surrogate errors; hybrid PCE+GP strategies deliver nearly unbiased posteriors (Zhang et al., 2018).
Theoretical error bounds: Quantitative statements link surrogate error in $L^2$ norms (weighted by the posterior) to KL divergence between true and surrogate-driven posteriors (Li et al., 2013, Cao et al., 2024). Under canonical source conditions, RSVD-based regularized inverses provably converge to the true solution at rates depending on noise and surrogate rank (Ito et al., 2019).
Data-driven regularization: Score-based generative priors learned from realistic data enforce plausible solution structures, functioning as nonparametric regularizers in Bayesian posterior sampling (Feng et al., 16 Sep 2025).
Credible interval and coverage analysis: Empirical studies confirm statistical reliability of surrogate-driven intervals, e.g. 65–72% empirical coverage for nominal 68% intervals (Wringer et al., 19 Dec 2025), and 95% credible interval coverage in GP-surrogate Bayesian inversion (Wang et al., 23 Jul 2025).

Uncertainty sources include both aleatoric (measurement noise), and epistemic (surrogate approximation error). Adaptive error estimation and diagnostic metrics (e.g. held-out RMSE, R $^2$ statistic, autoencoder consistency checks) are used to monitor and control surrogate fidelity (Hart et al., 24 Jan 2025).

6. Representative Applications and Performance Metrics

Broad dissemination of surrogate-based regularized inversion is documented across scientific domains:

Application	Surrogate Model	Speed-up/Accuracy
Exoplanet Interiors	PCK (PCE+GP)	$320\times$ faster; R $^2 > 0.99$ (Wringer et al., 19 Dec 2025)
Waveform/GPR Inversion	PCE+sequential refinement	$10^2$ – $10^3\times$ speed-up (Meles et al., 6 May 2025, Zhou et al., 15 Jan 2026)
Hydrological Systems	PCE+GP; adaptive loop	Bias eliminated; O( $10^3$ ) fewer calls (Zhang et al., 2018)
Linear Problems	RSVD-regularized surrogate	$20$– $50\times$ faster; provable bounds (Ito et al., 2019)
Landscape Evolution	NN surrogate; PT sampling	$1.7$– $2.7\times$ faster; RMSE within $5\%$ (Chandra et al., 2018)

Performance is measured by solution accuracy (RMSE, SNR, PSNR, SSIM), computational time, posterior coverage, and convergence metrics. Surrogate-enabled frameworks enable large-scale studies, posterior population analyses, and robust uncertainty quantification that are infeasible with traditional forward solvers.

7. Limitations, Extensions, and Guidance

Surrogate regularized inversion is effective where forward model fidelity is critical and model evaluation is computationally expensive. Key limitations and best practices include:

Dimensionality constraints: GP and PCE surrogates scale poorly with $d\gtrsim 30$ ; extensions such as additive kernels, low-rank decompositions, and deep architectures can mitigate this (Wringer et al., 19 Dec 2025).
Training set design: Sampling strategies (Latin Hypercube, targeted posterior sampling, adaptive acquisition) focus training on posterior-relevant regions (Wang et al., 23 Jul 2025).
Surrogate generalization: Regularly monitored held-out error statistics, retraining in shifting posterior regions, and hybrid residual correction help preserve reliability and avoid overfitting or bias (Zhang et al., 2018).
Regularization selection: Implicit regularizers (score-based priors, denoisers) can outperform hand-tuned penalties; explicit tuning (L-curve, cross-validation) remains necessary for classical choices (Liu et al., 2022, Luiken et al., 2024).
Amortization and transferability: Structure-exploiting surrogates (e.g. LazyDINO) can deliver cost reductions for repeated inversions, provided offline samples cover the relevant parameter space (Cao et al., 2024).
Numerical convergence: Convergence guarantees require non-expansive surrogate mappings, proper variance estimation, and sufficient capacity/training coverage (Wang et al., 23 Jul 2025, Ito et al., 2019, Luiken et al., 2024).

Surrogates represent a foundational technology in modern inverse problem regularization, harmonizing statistical rigor, computational efficiency, and robust performance across diverse scientific and engineering domains.