Bayesian Optimization Framework

Updated 2 February 2026

Bayesian Optimization Framework is a model-based, sequential global optimization process that efficiently locates optima in expensive, noisy, and derivative-free black-box functions.
It integrates Gaussian process surrogates with acquisition functions like Expected Improvement, Probability of Improvement, and Upper Confidence Bound to balance exploration and exploitation.
Its practical applications include chemical reaction optimization, catalyst discovery, and control system tuning, demonstrating significant improvements in sample efficiency.

Bayesian optimization (BO) is a model-based, sequential global optimization framework designed to efficiently locate high-quality optima of expensive, noisy, and derivative-free black-box functions. The core methodology involves constructing a probabilistic surrogate—typically a Gaussian process (GP)—for the objective function, and using an acquisition function to decide where to collect the next data point. BO has become a cornerstone for design and decision-making in domains where direct evaluation is costly, enabling automation and efficiency in science, engineering, manufacturing, and process systems. The following exposition outlines the mathematical principles, algorithmic structure, extensions for structured domains, representative applications, and current research frontiers in BO (Paulson et al., 2024).

1. Mathematical Structure and Sequential Algorithm

Let $f:\mathbb{R}^d\rightarrow\mathbb{R}$ denote the expensive black-box objective, defined over a compact domain $\mathcal{X}\subset\mathbb{R}^d$ (e.g., box-constrained). The design goal is to identify

$x^* \in \underset{x\in\mathcal{X}}{\arg\max}~f(x)$

via a minimal number of function evaluations. The standard BO protocol is as follows:

Surrogate modeling: Condition a probabilistic model $p(f|\mathcal{D}_k)$ on data $\mathcal{D}_k = \{(x_i, y_i)\}_{i=1}^k$ , where $y_i = f(x_i) + \epsilon_i$ with $\epsilon_i$ as i.i.d. noise.
Acquisition optimization: Define an acquisition function $\alpha_k(x)$ that scores query value under the current model.
Sample selection: Solve $x_{k+1} = \underset{x\in\mathcal{X}}{\arg\max}~\alpha_k(x)$ .
Function evaluation: Measure $y_{k+1} = f(x_{k+1})+\epsilon$ and augment $\mathcal{X}\subset\mathbb{R}^d$ 0.
Repeat: Iterate until the evaluation budget or convergence criterion is met.

This procedure ensures allocation of evaluations to regions of highest prospective utility, typically outperforming grid, random, or derivative-free algorithms in sample efficiency.

2. Core Components: Probabilistic Surrogates and Acquisition Functions

2.1 Gaussian Process Surrogate

The most commonly employed surrogate is a GP prior over $\mathcal{X}\subset\mathbb{R}^d$ 1: $\mathcal{X}\subset\mathbb{R}^d$ 2 where $\mathcal{X}\subset\mathbb{R}^d$ 3 is the prior mean (often zero) and $\mathcal{X}\subset\mathbb{R}^d$ 4 is a positive-definite kernel. Conditioned on $\mathcal{X}\subset\mathbb{R}^d$ 5, the predictive posterior at $\mathcal{X}\subset\mathbb{R}^d$ 6 has: $\mathcal{X}\subset\mathbb{R}^d$ 7 where $\mathcal{X}\subset\mathbb{R}^d$ 8 is the kernel matrix, $\mathcal{X}\subset\mathbb{R}^d$ 9 is a vector of covariances, and $x^* \in \underset{x\in\mathcal{X}}{\arg\max}~f(x)$ 0 is the noise variance. Hyperparameters are estimated via marginal likelihood maximization or Bayesian inference.

2.2 Acquisition Functions

Acquisition functions balance exploration (sampling where uncertainty is high) and exploitation (sampling where the surrogate is optimal). Common choices include:

Probability of Improvement (PI):

$x^* \in \underset{x\in\mathcal{X}}{\arg\max}~f(x)$ 1

where $x^* \in \underset{x\in\mathcal{X}}{\arg\max}~f(x)$ 2 and $x^* \in \underset{x\in\mathcal{X}}{\arg\max}~f(x)$ 3 is an offset.

Expected Improvement (EI):

$x^* \in \underset{x\in\mathcal{X}}{\arg\max}~f(x)$ 4

with $x^* \in \underset{x\in\mathcal{X}}{\arg\max}~f(x)$ 5, $x^* \in \underset{x\in\mathcal{X}}{\arg\max}~f(x)$ 6 and $x^* \in \underset{x\in\mathcal{X}}{\arg\max}~f(x)$ 7 as Gaussian CDF and PDF.

Upper Confidence Bound (UCB):

$x^* \in \underset{x\in\mathcal{X}}{\arg\max}~f(x)$ 8

$x^* \in \underset{x\in\mathcal{X}}{\arg\max}~f(x)$ 9 tunes the exploration–exploitation bias.

$p(f|\mathcal{D}_k)$ 0 is maximized over $p(f|\mathcal{D}_k)$ 1 during each iteration, with gradient-based methods facilitating inner optimization for moderate dimensions ( $p(f|\mathcal{D}_k)$ 2).

3. Incorporation of Structural and Domain-specific Features

BO’s flexibility arises from its ability to encode structural characteristics of practical design problems:

Constraints:
- Known (white-box) constraints: Incorporated directly in $p(f|\mathcal{D}_k)$ 3 or enforced within acquisition optimization.
- Unknown (black-box) constraints: Modeled as secondary GPs; feasibility is handled via acquisition modification (e.g., PI scaled by probability of feasibility) or by safe BO that maintains confidence-bound safe sets.
Multi-fidelity/Multi-information source:

When multiple models of varying cost and fidelity $p(f|\mathcal{D}_k)$ 4, are available, a co-Kriging GP or augmented input with fidelity-index allows the acquisition to select both $p(f|\mathcal{D}_k)$ 5 and fidelity in each step.

Multi-objective:

BO for $p(f|\mathcal{D}_k)$ 6 (with $p(f|\mathcal{D}_k)$ 7) leverages scalarization (e.g., weighted Tchebycheff) or hypervolume-based acquisition functions to approximate the Pareto frontier.

Discrete/hybrid design spaces:

Specialized surrogate kernels or tree-based models (e.g., random forests) manage categorical/integer input spaces.

4. Practical Optimization, Performance, and Challenges

Surrogate/model reliability: Overconfident GPs can mislead the acquisition—cross-validation or Bayesian hyperparameter inference mitigates risk.
Acquisition optimization scalability: Kernel matrix inversion is cubic in $p(f|\mathcal{D}_k)$ 8; sparse/inducing-variable approximations can reduce to $p(f|\mathcal{D}_k)$ 9.
Dimensionality and sample efficiency: Reliability is best for moderate $\mathcal{D}_k = \{(x_i, y_i)\}_{i=1}^k$ 0 (up to 20–30); high dimensions necessitate dimension reduction, random subspaces, or trust-region methods.
Constraint enforcement: Careful treatment is essential, especially in hybrid and high-dimensional domains.

5. Illustrative Applications in Sustainable Process Systems

Self-driving laboratory for reaction yield: Shields et al. demonstrated that BO required $\mathcal{D}_k = \{(x_i, y_i)\}_{i=1}^k$ 130 experiments to reach 95% yield versus >200 for grid search—a >6 $\mathcal{D}_k = \{(x_i, y_i)\}_{i=1}^k$ 2 gain in sample efficiency.
Catalytic material discovery: Multi-fidelity GP BO using both DFT simulations and lab synthesis identified optimal catalysts in under 50 high-cost experiments, whereas standard approaches required fourfold more (Paulson et al., 2024).
Distributed control design: PID controller tuning via BO in under 100 simulated trials matched human-expert performance at a fraction of human labor.

6. Open Research Directions

Key frontiers for BO include:

Suboptimality Theory: Quantifying the performance gap between tractable acquisition policies and the Bayesian optimal—i.e., the full solution to the dynamic programming problem—remains open.
Unified frameworks for complex structure: Simultaneously handling multi-objective, constraints, fidelity, and large $\mathcal{D}_k = \{(x_i, y_i)\}_{i=1}^k$ 3 in a statistically principled, scalable way is an unresolved challenge.
Novel problem types: BO under human-in-the-loop preferences, causal feedback, and over combinatorial/graph domains (e.g., molecules, materials) are areas of active development.
Meta-learning/Transfer: Leveraging prior BO runs across related tasks can substantially reduce sample requirements in new design problems.

7. Summary Table of Main Mathematical Objects

Component	Mathematical Representation	Comments / Key Equations
Surrogate Model	$\mathcal{D}_k = \{(x_i, y_i)\}_{i=1}^k$ 4	Closed-form posterior $\mathcal{D}_k = \{(x_i, y_i)\}_{i=1}^k$ 5, $\mathcal{D}_k = \{(x_i, y_i)\}_{i=1}^k$ 6
Acquisition (PI)	$\mathcal{D}_k = \{(x_i, y_i)\}_{i=1}^k$ 7	Exploitation-exploration tradeoff via $\mathcal{D}_k = \{(x_i, y_i)\}_{i=1}^k$ 8
Acquisition (EI)	$\mathcal{D}_k = \{(x_i, y_i)\}_{i=1}^k$ 9	$y_i = f(x_i) + \epsilon_i$ 0
Acquisition (UCB)	$y_i = f(x_i) + \epsilon_i$ 1	Theoretical regret bounds under RKHS assumption
Constraints	GP for $y_i = f(x_i) + \epsilon_i$ 2, acquisition weighted by $y_i = f(x_i) + \epsilon_i$ 3 or safe set maintenance	Black-box feasibility via auxiliary surrogate
Multi-fidelity	co-Kriging or augmented input $y_i = f(x_i) + \epsilon_i$ 4	Acquisition chooses $y_i = f(x_i) + \epsilon_i$ 5 for gain per cost
Multi-objective	Scalarization or hypervolume-based acquisition on GP over vector-valued $y_i = f(x_i) + \epsilon_i$ 6	Approximates Pareto frontier efficiently
Discrete/Hybrid	Tree kernel, random forest, categorical GPs	Accommodates non-continuous design spaces

8. Concluding Remarks

The Bayesian optimization framework provides a principled and extensible paradigm for global optimization under severe function evaluation constraints. By uniting probabilistic surrogate modeling (typically GPs) with acquisition functions and mechanisms for exploiting structured domain knowledge (e.g., constraints, multi-fidelity, multiple objectives), BO achieves order-of-magnitude improvements in the sample efficiency of design and discovery tasks. Addressing scalability and unifying treatment of emerging application structures, as well as advancing the connection to dynamic programming theory, remain important directions (Paulson et al., 2024).

Markdown Report Issue Upgrade to Chat

References (1)

Bayesian optimization as a flexible and efficient design framework for sustainable process systems (2024)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Bayesian Optimization Framework.