Bayesian Quadrature: Probabilistic Integration
- Bayesian quadrature is a probabilistic numerical integration framework that uses Gaussian process priors for both point estimation and uncertainty quantification.
- It actively selects sample points to minimize the posterior variance, outperforming traditional Monte Carlo methods in smooth, low-dimensional settings.
- Extensions include kernel herding, multi-output integration, and the assimilation of derivative information to accelerate convergence rates.
Bayesian quadrature is a probabilistic numerical integration method grounded in Gaussian process (GP) modeling, which provides both point estimates and uncertainty quantification for integrals of computationally expensive or black-box functions. The framework interprets integration as an inference problem, leveraging the machinery of GP regression to actively select sample points and propagate numerical error quantitatively. Bayesian quadrature has rigorous connections to kernel methods, information-based complexity, and optimal design, and it is increasingly utilized in scientific computing, machine learning, model selection, and uncertainty quantification.
1. Mathematical Foundations
Let be the integrand and a known probability density on . The goal is to approximate the integral
Bayesian quadrature places a Gaussian process prior,
over , with mean function and covariance kernel . Given (typically noiseless) evaluations , the GP posterior yields an analytically tractable joint Gaussian law over the integral: where , . The posterior variance is
Hence, Bayesian quadrature recasts numerical integration as the estimation of a linear functional under a GP prior (Briol et al., 2015).
2. Optimal Node Placement and Kernel Herding
The placement of sample locations is critical: classic Monte Carlo and quasi-Monte Carlo use random or low-discrepancy sequences, with convergence rates and , respectively. In contrast, Bayesian quadrature actively selects points to minimize the posterior variance of at each step; sequential BQ can be viewed as a form of optimally-weighted kernel herding (Huszár et al., 2012). Kernel herding iteratively constructs a quadrature rule to greedily minimize the RKHS worst-case integration error—equivalent to the BQ posterior variance; SBQ (sequential BQ) further optimizes quadrature weights non-uniformly to strictly outperform unweighted herding.
The convergence rate of BQ is typically in smooth, low-dimensional settings and can be faster—sometimes exponential—if the kernel is analytic and the integrand lies in a finite-dimensional subspace (Briol et al., 2015, Huszár et al., 2012). BQ weights for fixed nodes minimize the maximum mean discrepancy (MMD) between the prior and weighted empirical measure.
3. Extensions: Domains, Information Sources, and Surrogates
Discrete and Mixed Domains
Bayesian quadrature naturally generalizes to discrete or mixed domains through appropriate kernel and measure choices. BayesSum extends BQ to arbitrary discrete spaces, with closed-form expressions for typical kernels (e.g., Hamming or Brownian-motion) and measure pairs, achieving empirically significantly faster rates than Monte Carlo or stratified sampling when the kernel mean and initial kernel error are tractable (Kang et al., 18 Dec 2025, Adachi et al., 2023).
Multiple Related Integrals
Multi-output Gaussian processes enable information transfer among several related integrals, such as multi-fidelity engineering models or computer graphics estimators. Multi-output BQ, given block-structured kernels, produces quadrature weights and covariances that simultaneously integrate all outputs, reducing sample complexity by leveraging cross-correlation (Xi et al., 2018, Gessner et al., 2019).
Incorporating Derivatives
If gradients or Hessians of the integrand are available, these can be assimilated into the GP prior, greatly accelerating the shrinkage of the posterior variance on . Gradient-enhanced BQ achieves convergence rates up to in one dimension, with each derivative datum counting for several function evaluations (Wu et al., 2017).
Non-Gaussian Priors and BART-Int
While most BQ approaches assume a GP prior, other surrogates such as Bayesian Additive Regression Trees (BART) have been proposed for high-dimensional, non-smooth, or discontinuous integrands. BART-based quadrature (BART-Int) estimates the integral and its uncertainty via MCMC over trees, which can outperform GP-BQ on non-smooth or high-dimensional problems (Zhu et al., 2020).
4. Acquisition Functions and Adaptive Sampling
The performance of BQ is fundamentally governed by the acquisition function used to select new points. Standard acquisition strategies include minimizing posterior variance of , entropy-minimization, or maximizing expected variance reduction. Modern approaches design acquisitions targeting directly the reduction of uncertainty in model evidence, posterior modes, or expected prediction error (e.g., PUQ, PVC, PLUR, PEUR acquisition criteria) (Song et al., 10 Oct 2025). Transitional Bayesian quadrature (TBQ) applies these ideas to sequential tempering and bridge sampling for sharply peaked posteriors.
Batch acquisition, critical for parallel or high-throughput settings, is implemented by heuristics like Kriging Believer, local penalization (WSABI-L batch BQ), or, more generally, kernel recombination algorithms which ensure diversified, near-independent batch selections. Kernel recombination methods solve convex programs to match moments of the empirical and prior measures in the RKHS, optimizing for "worst-case error" or MMD (Wagstaff et al., 2018, Adachi et al., 2022, Adachi et al., 2023).
5. Theoretical Guarantees and Stability
Bayesian quadrature rules with weights derived from totally positive kernels and nodes which minimize posterior variance guarantee positivity and stability of weights in one dimension; at least half the weights are positive for arbitrary node sets, and all are strictly positive at (local) optima. In Sobolev (e.g., Matérn) RKHSs and for quasi-uniform nodes, BQ weights decrease as ; the stability constant grows only as , mitigating numerical instability as increases (Karvonen et al., 2018).
Convergence rates are determined by the smoothness of the kernel and the fill-distance of the point set. For analytic kernels and uniformly spaced nodes, exponential rates in are possible; for Sobolev kernels, polynomial rates hold. The minimax-optimality of Bayesian quadrature in RKHSs, under the Poincaré inequality, has been established, with the midpoint rule recovering minimax rates for the uniform case and generalizations available via finite elements and linear programming (2207.14564).
6. Applications: Model Evidence, Manifolds, and Robust Optimization
Bayesian quadrature is particularly well-suited for Bayesian model selection via marginal likelihood (model evidence) computation, outperforming MCMC and nested sampling in sample efficiency, especially for expensive or multimodal posteriors (Chai et al., 2019, Adachi et al., 2022).
On non-Euclidean domains, e.g., Riemannian data manifolds, Bayesian quadrature utilizes geodesic kernels and volume elements, greatly reducing the number of necessary expensive ODE solves for normalization constants or posterior expectations (Fröhlich et al., 2021).
Distributionally robust Bayesian quadrature optimization extends BQ to handle distributional uncertainty in the data-generating process, efficiently balancing exploitation and robustness against misspecification of the underlying probability law (Nguyen et al., 2020).
7. Advanced Variants and Open Directions
Recent advances include:
- Invariant priors for BQ, constructing covariance kernels invariant under symmetries (rotations, reflections) to enforce known problem structures and increase sample efficiency (Naslidnyk et al., 2021).
- Marginalizing over kernels in GP models with Bayesian quadrature, using MMD hyper-kernels to capture invariances in kernel families and propagating uncertainty over kernel choice into both marginal likelihood and predictive distributions (Hamid et al., 2021).
- Highly parallel BQ algorithms (e.g., BASQ, SOBER) that exploit kernel recombination, moment-matching, and convex programming to enable scalable inference for discrete, structured, and high-dimensional decision spaces (Adachi et al., 2022, Adachi et al., 2023).
Despite significant progress, open problems remain in scalable kernel mean computation, optimal kernel and point set design, convergence in high-dimensional or misspecified settings, and extensions beyond GPs to nonparametric, non-Gaussian, or hierarchical models. Bayesian quadrature remains an active research area with deep connections across machine learning, numerical analysis, and applied statistics.