Bayesian Quadrature

Updated 17 February 2026

Bayesian quadrature is a probabilistic numerical integration method that uses Gaussian processes to model functions and quantify uncertainty in integrals.
It generalizes classical quadrature by leveraging kernel-based methods and active sampling strategies, thereby enhancing sample efficiency for expensive evaluations.
Recent extensions include batch acquisition, multi-output models, and high-dimensional applications, making the approach versatile for complex integration tasks.

Bayesian quadrature (BQ) is a probabilistic numerical integration method that treats integration as a statistical inference problem. By leveraging surrogate models—primarily Gaussian processes (GPs)—BQ constructs a posterior distribution over the value of an integral given pointwise function evaluations, allowing for explicit uncertainty quantification. BQ generalizes classical quadrature and connects to kernel-based numerical analysis, herding, optimal experimental design, and probabilistic numerics. BQ achieves high sample efficiency, especially when the integrand is expensive to evaluate, and supports varied extensions including active learning, batch acquisition, multi-output models, and non-GP priors.

1. Bayesian Quadrature: Core Framework

The foundational BQ formalism is as follows. Let $f:X \to \mathbb{R}$ be a measurable function and $p(x)$ a known probability density on $X \subset \mathbb{R}^d$ . The goal is to approximate the integral

$I[f] = \int_X f(x) p(x)\, dx.$

In BQ, $f$ is modeled as a sample from a GP,

$f \sim GP(m(x), k(x, x')),$

where $k$ is a positive-definite kernel and $m$ is a prior mean. After observing function values $y = [f(x_1), \ldots, f(x_n)]^\top$ at nodes $X = \{x_1, ..., x_n\}$ , the induced posterior on $p(x)$ 0 is Gaussian with explicit mean and variance: $p(x)$ 1 where $p(x)$ 2 and $p(x)$ 3 are the posterior mean and covariance of $p(x)$ 4 (conditional on $p(x)$ 5 and $p(x)$ 6) (Huszár et al., 2012). Crucially, the BQ estimator is a weighted sum: $p(x)$ 7 where $p(x)$ 8 is the $p(x)$ 9 kernel matrix $X \subset \mathbb{R}^d$ 0 and $X \subset \mathbb{R}^d$ 1 (Karvonen et al., 2018).

The posterior variance $X \subset \mathbb{R}^d$ 2 reflects the uncertainty in the integral estimate and corresponds exactly to the squared worst-case error in the reproducing kernel Hilbert space (RKHS) associated with $X \subset \mathbb{R}^d$ 3. This property aligns BQ with kernel quadrature (KQ).

2. Optimal Point Selection and Connection to Kernel Herding

Sequential BQ (SBQ) actively selects evaluation sites $X \subset \mathbb{R}^d$ 4 at each step to minimize the posterior variance of the integral, which is equivalent to minimizing the maximum mean discrepancy (MMD) between $X \subset \mathbb{R}^d$ 5 and the empirical measure $X \subset \mathbb{R}^d$ 6 over kernel $X \subset \mathbb{R}^d$ 7 (Huszár et al., 2012). Explicitly,

$X \subset \mathbb{R}^d$ 8

For optimally chosen weights $X \subset \mathbb{R}^d$ 9, the BQ posterior variance equals $I[f] = \int_X f(x) p(x)\, dx.$ 0.

Kernel herding is a related deterministic sampling scheme, where nodes are selected (typically with uniform weights) to greedily minimize this same MMD. SBQ generalizes kernel herding by (i) optimizing sample sites and (ii) allowing nonuniform, possibly signed weights derived from the GP posterior (Huszár et al., 2012). SBQ achieves faster empirical convergence—faster than $I[f] = \int_X f(x) p(x)\, dx.$ 1—outperforming both uniform-weight herding and standard Monte Carlo (Huszár et al., 2012).

3. Properties of Bayesian Quadrature Weights

The BQ weights $I[f] = \int_X f(x) p(x)\, dx.$ 2 are determined by the kernel, integration measure, and design nodes. Key properties include:

Positivity: In univariate BQ, all weights are positive if the kernel is totally positive (e.g., Gaussian, Hardy kernels) and the node configuration is a local minimizer of the posterior variance. For totally positive kernels of order 1, at least $I[f] = \int_X f(x) p(x)\, dx.$ 3 weights are positive; order 2 and locally optimal nodes guarantee all weights are positive (Karvonen et al., 2018).
Magnitude: With shift-invariant kernels whose RKHS is norm-equivalent to a Sobolev space $I[f] = \int_X f(x) p(x)\, dx.$ 4 ( $I[f] = \int_X f(x) p(x)\, dx.$ 5), the maximal weight magnitude scales as $I[f] = \int_X f(x) p(x)\, dx.$ 6, where $I[f] = \int_X f(x) p(x)\, dx.$ 7 is the fill distance and $I[f] = \int_X f(x) p(x)\, dx.$ 8 is the separation radius of the node set. For quasi-uniform point sets (with $I[f] = \int_X f(x) p(x)\, dx.$ 9 bounded), $f$ 0 and the $f$ 1 stability constant grows at most as $f$ 2 (Karvonen et al., 2018).

These results guide practical node selection strategies: gradient descent minimization of posterior variance and quasi-uniform sampling deliver robust, stable rules. Challenges remain in multivariate settings, especially for ensuring positivity of weights.

4. Theoretical Error Bounds and Rates

Sharp average-case and minimax error rates can be established for BQ in RKHSs associated with Matérn and squared exponential (SE) kernels:

Matérn Kernel: For functions in the RKHS of a Matérn- $f$ 3 kernel on $f$ 4, optimally designed BQ achieves

$f$ 5

matching the information-theoretic lower bound up to constants, where $f$ 6 is the noise standard deviation (Cai et al., 2022).

Squared Exponential (SE) Kernel: For the SE kernel, the average-case error decays sub-exponentially,

$f$ 7

for some $f$ 8 (Cai et al., 2022).

Misspecification: If the GP prior is misspecified (kernel smoother than the true integrand), convergence slows, with multiple decay terms reflecting both smoothness mismatch and the noise floor (Cai et al., 2022).

These rates are tight; no method (even adaptive or randomized) can systematically outperform $f$ 9 for Matérn kernels (Cai et al., 2022).

5. Batch, Parallel, and High-Dimensional Bayesian Quadrature

Serial BQ is limited by costly sequential sampling. Recent advances enable efficient batching and parallelization:

Batch Acquisition: Batch-BQ algorithms select $f \sim GP(m(x), k(x, x')),$ 0 points per iteration using Kriging-Believer or local penalization strategies, avoiding point clustering and achieving near-linear speedups in wall-clock time (Wagstaff et al., 2018). Kernel recombination and convex optimization frameworks (as in BASQ and SOBER) support batch selection of quadrature rules that minimize the RKHS worst-case error and scale to large candidate sets, discrete/mixed domains, and highly parallel compute (Adachi et al., 2022, Adachi et al., 2023).
High-Dimensional & Non-GP Models: Bayesian Stein Networks (neural alternatives to GP-BQ) use Stein operators and Laplace-approximated posteriors to achieve scalable, uncertainty-quantified BQ at nearly linear cost in $f \sim GP(m(x), k(x, x')),$ 1 (Ott et al., 2023). Tree-based priors (e.g., BART-Int) scale to discontinuous and high-dimensional integrands, outperforming standard GP-BQ where smoothness assumptions are violated (Zhu et al., 2020).

BQ further extends to invariant-predictive models (encoding known data symmetries via group-averaged kernels) (Naslidnyk et al., 2021), integration on manifolds via geometry-aware GPs (Fröhlich et al., 2021), and kernel-marginalization for robust Bayesian model evidence with hyper-kernels over kernel parameters (Hamid et al., 2021).

6. Active Learning, Acquisition Strategies, and Applications

Active sampling via acquisition functions (variance reduction, mutual information, expected reduction in uncertainty) is central to efficient BQ:

Acquisition Optimization: Acquisition functions quantify the utility of evaluating $f \sim GP(m(x), k(x, x')),$ 2 at candidate points, targeting maximal reduction in posterior integral variance or information gain about the integral (Song et al., 10 Oct 2025, Wagstaff et al., 2018).
Multi-output & Multi-source BQ: BQ generalizes to vector-valued functions with outputs $f \sim GP(m(x), k(x, x')),$ 3, sharing information between related integrals (e.g., multi-fidelity models, multi-source simulations). Optimal information transfer can be achieved through cost-sensitive multi-source acquisition, balancing fidelity, cost, and informativeness (Xi et al., 2018, Gessner et al., 2019).
Model Evidence & Selection: BQ provides sample-efficient computation of normalizing constants and Bayesian model evidence, outperforming MCMC in regimes with expensive likelihoods. Tailored mutual information acquisition can focus computational effort on the models and parameter regimes most relevant for posterior discrimination (Chai et al., 2019).

Key application domains include Bayesian inference (evidence computation), scientific computing (integration of expensive black-box models), probabilistic numerics, and accelerating simulation-based inference.

7. Extensions, Future Directions, and Open Challenges

Recent directions and outstanding questions include:

Noise-Robust and Adaptive Designs: Advanced schemes achieve minimax optimal rates in noisy and misspecified settings, while avoiding costly repeated sampling at each site (Cai et al., 2022). Extensions to heavy-tailed noise models and adaptive designs that simultaneously optimize fill distance and error averaging are open problems.
Integration of Derivative Information: Incorporating gradient and Hessian data into BQ accelerates uncertainty reduction and convergence, especially for highly-structured integrands [(Wu et al., 2017)* – details not fully available*].
Stability, Positivity, and High-Dimensionality: Ensuring positive, well-scaled weights remains a challenge in high-dimensions and for infinitely differentiable kernels (e.g., Gaussian); controlling the “Runge phenomenon” and sharp transitions in weight magnitudes requires further analysis (Karvonen et al., 2018).
Combination with Bayesian Optimization: Estimating log-partition functions $f \sim GP(m(x), k(x, x')),$ 4 interpolates between BQ and Bayesian optimization regimes, with the difficulty transitioning from statistical integration to pure optimization as $f \sim GP(m(x), k(x, x')),$ 5 increases (Cai et al., 2024).

The BQ field remains active, with continued developments in theoretical foundations, scalable algorithms, probabilistic numerics, and cross-disciplinary applications.

Key references:

(Huszár et al., 2012): Optimally-Weighted Herding is Bayesian Quadrature
(Karvonen et al., 2018): On the positivity and magnitudes of Bayesian quadrature weights
(Cai et al., 2022): On Average-Case Error Bounds for Kernel-Based Bayesian Quadrature
(Adachi et al., 2022): Fast Bayesian Inference with Batch Bayesian Quadrature via Kernel Recombination
(Wagstaff et al., 2018): Batch Selection for Parallelisation of Bayesian Quadrature
(Fröhlich et al., 2021): Bayesian Quadrature on Riemannian Data Manifolds
(Song et al., 10 Oct 2025): Bayesian Model Inference using Bayesian Quadrature: the Art of Acquisition Functions and Beyond
(Ott et al., 2023): Bayesian Numerical Integration with Neural Networks
(Zhu et al., 2020): Bayesian Probabilistic Numerical Integration with Tree-Based Models
(Xi et al., 2018): Bayesian Quadrature for Multiple Related Integrals
(Naslidnyk et al., 2021): Invariant Priors for Bayesian Quadrature
(Gessner et al., 2019): Active Multi-Information Source Bayesian Quadrature
(Chai et al., 2019): Automated Model Selection with Bayesian Quadrature
(Cai et al., 2024): Kernelized Normalizing Constant Estimation: Bridging Bayesian Quadrature and Bayesian Optimization
(Hamid et al., 2021): Marginalising over Stationary Kernels with Bayesian Quadrature