Bayesian Full Waveform Inversion

Updated 22 December 2025

Bayesian Full Waveform Inversion is a probabilistic framework that models subsurface parameters, noise, and data as random variables to provide robust uncertainty estimates.
It integrates implicit neural and generative priors, such as deep networks, autoencoders, and diffusion models, to reduce dimensionality and ensure geologically plausible results.
Scalable inference methods like variational inference, ensemble Kalman inversion, and MCMC enable practical application in complex 3D, time-lapse, and multi-physics scenarios.

Bayesian Full Waveform Inversion (FWI) constitutes a rigorously probabilistic framework for recovering subsurface physical parameters (e.g., wave velocity) from seismic waveform data. By modeling all quantities—model, noise, data—as random variables, Bayesian FWI yields not only an image of the Earth’s interior but also a quantified measure of uncertainty, directly addressing non-uniqueness, nonlinearity, and ill-posedness that limit deterministic inversion approaches. Recent advances have enabled the practical use of sophisticated prior models (deep neural networks, diffusion models, autoencoders), scalable approximate inference (variational, ensemble, particle-based), and deployment to challenging regimes such as 3D, time-lapse/monitoring, or multi-physics scenarios.

1. Bayesian Formulation of Full Waveform Inversion

Let $m$ denote the subsurface physical model (e.g., velocity field), $d_{\text{obs}}$ the observed seismic traces, and $F(m)$ the forward operator solving the acoustic or elastic wave equation. The Bayesian inverse problem seeks the posterior: $p(m\,|\,d_{\text{obs}}) \propto p(d_{\text{obs}}\,|\,m)\,p(m)$ The standard likelihood, under additive uncorrelated Gaussian noise $\sigma^2$ , is: $p(d_{\text{obs}}\,|\,m) \propto \exp\bigl[-\frac{1}{2\sigma^2} \|F(m) - d_{\text{obs}}\|^2 \bigr]$ The prior $p(m)$ can be a simple Gaussian or uniform, but cutting-edge approaches instead utilize physically and geologically informed, high-capacity priors parameterized by neural networks, generative models, or stochastic processes (Sun et al., 2022, Xie et al., 2024, Taufik et al., 14 Dec 2025, Li et al., 6 May 2025, Hu et al., 4 Nov 2025).

The inference objective is to characterize $p(m\,|\,d_{\text{obs}})$ : recovering its mean, uncertainty (variance, credible intervals), and possibly samples representing the range of plausible subsurface configurations.

2. Implicit Neural and Generative Priors

A key advance in Bayesian FWI is parameterizing subsurface models not on a fixed grid but as outputs of implicit neural representations $m(x) = N_\Theta(x)$ , where $N_\Theta$ is a deep network and $\Theta$ its parameters (Sun et al., 2022, Yan et al., 2023). The Bayesian formulation then treats $\Theta$ as the high-dimensional variable of inference, with a prior (typically, $\Theta \sim \mathcal{N}(0,\,\sigma^2 I)$ ) enforcing parsimony.

Other generative approaches use pretrained GANs or diffusion models for $p(m)$ , learning a map from low-dimensional latent variables $z$ to physically plausible subsurface fields: $m = G_\theta(z)$ (Xie et al., 2024, Taufik et al., 14 Dec 2025, Li et al., 6 May 2025, Hu et al., 4 Nov 2025). In the differentiable structured paradigm, a convolutional autoencoder is trained to encode $m \leftrightarrow z$ , with a Gaussian prior on $z$ (Hu et al., 4 Nov 2025). Diffusion models are trained on realistic geological structures, allowing the prior to support complex, multi-modal distributions consistent with prior geological knowledge (Taufik et al., 14 Dec 2025, Li et al., 6 May 2025).

This implicit representation dramatically reduces the effective dimension of the inverse problem and regularizes the inversion toward geologically plausible solutions.

3. Variational and Particle-based Bayesian Inference

Direct computation with the posterior $p(m\,|\,d_{\text{obs}})$ (or $p(\Theta\,|\,d_{\text{obs}})$ , $p(z\,|\,d_{\text{obs}})$ ) is intractable due to the dimensionality and the computational expense of evaluating $F(\cdot)$ . Several approximate inference methodologies are now in widespread use:

Variational Inference (VI): The posterior is approximated by a tractable parametric family $q(\cdot)$ (Gaussian, normalizing flow, dropout), optimized to minimize $\text{KL}(q \,||\, p)$ —equivalently, maximizing the Evidence Lower Bound (ELBO):

$\text{ELBO}(q) = \mathbb{E}_{q}[\log p(d|m)] - \text{KL}[q(m)\,||\,p(m)]$

For implicit representations, dropout can be used as a variational posterior on $\Theta$ , with stochastic network realizations corresponding to samples from $q(\Theta)$ (Sun et al., 2022, Yan et al., 2023).

Ensemble Kalman Inversion (EKI): Utilizes an ensemble of models, updated using a Kalman-like scheme, naturally incorporating uncertainty and spatial correlation via Gaussian random field priors (Li et al., 13 May 2025).
Markov Chain Monte Carlo (MCMC): HMC and MALA are implemented in latent representations (autoencoder or generator variables), benefiting from reduced dimensionality and efficient automatic differentiation (Hu et al., 4 Nov 2025, Xie et al., 2024, Lima et al., 2023). Transdimensional approaches using Reversible Jump HMC allow adaptive model complexity (Biswas et al., 2022).
Stein Variational Gradient Descent (SVGD) and Stochastic SVGD (sSVGD): Particle-based, gradient-driven sampling applicable in high-dimensional spaces, with added stochasticity (sSVGD) to ensure full exploration and asymptotic exactness (Zhang et al., 2022, Zhang et al., 2023, Zhang et al., 2021).

Posterior ensemble sampling enables explicit probabilistic percentile bands, variance maps, and credible intervals at each spatial location.

4. Uncertainty Quantification and Interpretation

The Bayesian posterior yields a spatial map of epistemic and, when applicable, aleatoric uncertainties. In the neural representation setting, uncertainty is estimated using Monte Carlo dropout: at test time, multiple random dropout realizations are propagated forward, yielding mean and variance fields at each $(x,z)$ (Sun et al., 2022, Yan et al., 2023). For generative priors, an ensemble of latent space samples (GAN, autoencoder, diffusion) is decoded, then propagated through the forward wave solver, with the empirical mean and variance quantifying uncertainty (Xie et al., 2024, Taufik et al., 14 Dec 2025, Li et al., 6 May 2025, Hu et al., 4 Nov 2025).

High predictive variance correlates with regions of low data sensitivity (e.g., below reflectors, outside ray coverage) or poor illumination. Diffusion-based posteriors and sample-based mapping provide calibrated uncertainty in both synthetic and field experiments (Taufik et al., 14 Dec 2025, Li et al., 6 May 2025). Uncertainty maps inform data acquisition strategies and highlight areas requiring tighter prior/model constraints.

5. Algorithmic Frameworks and Implementation

The core steps in Bayesian FWI are:

Representation: Choose and/or pretrain an implicit or generative representation: neural network $N_\Theta$ , GAN $G_\theta(z)$ , autoencoder-decoded variable $g(z)$ , or diffusion model.
Prior: Assign a Bayesian prior (zero-mean Gaussian, or learned geological) in the relevant parameterization.
Forward Model: Implement the physical wave equation solver (time or frequency domain, acoustic or elastic), supporting adjoint-state gradient computation for efficient backpropagation.
Inference: Implement variational updates (ELBO optimization with reparameterization gradients; dropout-based stochastic optimization), MCMC schemes (with latent-space or model-space gradients), or particle-based updates (SVGD/sSVGD).
Prediction and UQ: At test time, sample from the variational posterior or from the proposal chain; propagate samples through $N_\Theta$ , $G_\theta$ , or the decoder and compute mean/variance to yield uncertainty maps.

A high-level pseudocode schema for Bayesian IFWI (dropout-based) is:

Initialize network parameters Θ randomly
Set dropout probability p, weight decay λ

for epoch in range(N_epochs):
    for data batch in training_set:
        Sample random dropout mask M
        Compute forward pass N_Θ^drop with masked weights
        Simulate wavefield F(N_Θ^drop) -> d_syn
        Compute misfit loss ||R d_syn - d_obs||^2 + λ||Θ||^2
        Backpropagate and update Θ
for t in range(T_samples):
    Sample new dropout mask M_t
    Compute predicted model m^t(x) = N_{Θ^t}(x)
Compute predictive mean and variance maps

(Sun et al., 2022)

For latent-domain MCMC with autoencoder parameterization:

for i in range(N_MCMC):
    Propose z' via adaptive Langevin
    Decode m=g(z')
    Simulate f(m)
    Compute log-likelihood, prior
    Accept/reject (Metropolis step)
Compute mean and credible intervals over inverse-decoded {g(z_i)}

(Hu et al., 4 Nov 2025)

6. Applications: Time-Lapse, 3D, and Monitoring

Time-lapse FWI quantifies temporal changes in subsurface parameters, requiring careful joint or sequential Bayesian analysis of baseline and monitor surveys. Strategies include independent (parallel), sequential (monitor prior informed by baseline posterior), and joint inversion; Bayesian inference captures uncertainty propagation, cross-survey correlation, and mitigates acquisition non-repeatability artifacts (Lima et al., 2023, Zhang et al., 2023, Silva et al., 2024).

Bayesian FWI is being applied at increasing scale, including fully 3D problems using structured variational approximations (e.g., sparse Cholesky-covariance Gaussians), enabling computation of full posterior statistics at moderate cost (Zhang et al., 2022, Zhao et al., 2024).

Quantitative results demonstrate that Bayesian FWI with high-capacity priors and scalable inference recovers accurate velocity models with interpretable uncertainties, increases robustness to local minima and data gaps, and enables principled evaluation of alternative prior hypotheses (Sun et al., 2022, Li et al., 6 May 2025, Taufik et al., 14 Dec 2025, Hu et al., 4 Nov 2025, Zhao et al., 2024).

7. Computational Scaling, Limitations, and Future Directions

Bayesian FWI remains computationally demanding due to the need for repeated forward and adjoint PDE solves per posterior sample or particle. Model reduction via implicit neural parameterizations, latent autoencoder or GAN variables, or low-rank/sparse variational families is critical to tractability (Hu et al., 4 Nov 2025, Taufik et al., 14 Dec 2025, Xie et al., 2024, Li et al., 13 May 2025).

Future directions include deployment of Bayesian FWI on large field datasets (with field-scale 3D/4D data), further advances in scalable Monte Carlo and variational schemes, integration of advanced priors (e.g., conditional diffusion, geological simulation), and the development of automated tools for prior selection and sensitivity analysis (Zhao et al., 2024, Taufik et al., 14 Dec 2025, Li et al., 6 May 2025).

The field continues to move toward practical, uncertainty-resolving, and physically faithful imaging workflows combining physical inversion, deep probabilistic priors, and high-performance scalable inference (Sun et al., 2022, Taufik et al., 14 Dec 2025, Li et al., 6 May 2025, Hu et al., 4 Nov 2025).