Function-on-Function Bayesian Optimization

Updated 18 November 2025

Function-on-Function Bayesian Optimization is a framework that models both inputs and outputs as functions in infinite-dimensional spaces.
It employs novel surrogate models like function-on-function Gaussian processes and neural operator networks to efficiently quantify uncertainty.
FFBO uses scalarization and functional gradient methods to optimize expensive black-box mappings in applications such as PDE simulations and engineering design.

Function-on-Function Bayesian Optimization (FFBO) refers to a class of Bayesian optimization methods in which both the inputs and outputs of the target function are functions defined on continuous domains, often infinite-dimensional. FFBO generalizes classical Bayesian optimization beyond finite-dimensional (vector) spaces to settings arising in advanced scientific and engineering systems featuring functional parameters, controls, or outputs, such as PDE-based simulators, functional mechanical designs, and scientific operator learning. FFBO introduces new surrogate modeling, acquisition, and optimization strategies to address the mathematical and algorithmic challenges inherent to infinite-dimensional function spaces (Huang et al., 16 Nov 2025, Guilhoto et al., 2024, Jain et al., 2023).

1. Problem Formulation in FFBO

The FFBO framework addresses the global optimization of a mapping

$f: \mathcal{X}^p \to \mathcal{Y},$

where both the input $\bm{x} = (x_1, \ldots, x_p) \in \mathcal{X}^p \subset [L^2(\Omega_x)]^p$ and the output $f(\bm{x}) \in \mathcal{Y} = L^2(\Omega_y)$ are functions over compact domains $\Omega_x, \Omega_y \subset \mathbb{R}^d$ . The typical objective is to maximize a user-specified linear functional of the output,

$L_\phi f(\bm{x}) = \int_{\Omega_y} \phi(t)\, f(\bm{x})(t)\,dt,$

with $\phi \in L^2(\Omega_y)$ acting as a weighting function (e.g., Dirac delta, uniform, or a smoothing kernel). The black-box mapping $f$ is expensive to evaluate, and observations are noisy realizations in function space: $y_i = f(\bm{x}_i) + \varepsilon_i, \qquad \varepsilon_i \sim \mathcal{N}(0, \tau^2 I_\mathcal{Y}), \quad i=1, ..., n.$ The optimization goal is to find

$\bm{x}^\star = \arg\max_{\bm{x} \in \mathcal{X}^p} L_\phi f(\bm{x}),$

with as few expensive queries as possible, leveraging structure in $f$ and prior information (Huang et al., 16 Nov 2025).

2. Surrogate Modeling Approaches

2.1 Function-on-Function Gaussian Processes

The function-on-function Gaussian Process (FFGP) surrogate models the mapping $\bm{x} = (x_1, \ldots, x_p) \in \mathcal{X}^p \subset [L^2(\Omega_x)]^p$ 0 directly in the function space. The FFGP places a Gaussian process prior: $\bm{x} = (x_1, \ldots, x_p) \in \mathcal{X}^p \subset [L^2(\Omega_x)]^p$ 1 where $\bm{x} = (x_1, \ldots, x_p) \in \mathcal{X}^p \subset [L^2(\Omega_x)]^p$ 2 is the mean and $\bm{x} = (x_1, \ldots, x_p) \in \mathcal{X}^p \subset [L^2(\Omega_x)]^p$ 3 is an operator-valued kernel. The standard choice is separable: $\bm{x} = (x_1, \ldots, x_p) \in \mathcal{X}^p \subset [L^2(\Omega_x)]^p$ 4 with $\bm{x} = (x_1, \ldots, x_p) \in \mathcal{X}^p \subset [L^2(\Omega_x)]^p$ 5 a positive-definite scalar kernel over function inputs (using $\bm{x} = (x_1, \ldots, x_p) \in \mathcal{X}^p \subset [L^2(\Omega_x)]^p$ 6 distance), and $\bm{x} = (x_1, \ldots, x_p) \in \mathcal{X}^p \subset [L^2(\Omega_x)]^p$ 7 a self-adjoint, positive Hilbert–Schmidt operator (e.g., integral operator with kernel $\bm{x} = (x_1, \ldots, x_p) \in \mathcal{X}^p \subset [L^2(\Omega_x)]^p$ 8) (Huang et al., 16 Nov 2025).

Posterior inference yields mean and covariance in $\bm{x} = (x_1, \ldots, x_p) \in \mathcal{X}^p \subset [L^2(\Omega_x)]^p$ 9, with explicit expressions for practical computation using truncated eigenbasis decompositions.

2.2 Neural Operator Surrogates

An alternative surrogate uses operator-learning neural networks such as NEON (Neural Epistemic Operator Networks) (Guilhoto et al., 2024). NEON comprises:

An encoder-decoder backbone: encoder $f(\bm{x}) \in \mathcal{Y} = L^2(\Omega_y)$ 0; decoder $f(\bm{x}) \in \mathcal{Y} = L^2(\Omega_y)$ 1, for modeling the deterministic operator $f(\bm{x}) \in \mathcal{Y} = L^2(\Omega_y)$ 2.
An epistemic uncertainty quantifier ("EpiNet"): a small network $f(\bm{x}) \in \mathcal{Y} = L^2(\Omega_y)$ 3 indexed by random variable $f(\bm{x}) \in \mathcal{Y} = L^2(\Omega_y)$ 4, providing an ensemble of predictions for efficient uncertainty quantification.

The NEON surrogate allows efficient parameterization and training, with ensembles generated via the EpiNet, and achieves parameter efficiency—typically requiring 1–2 orders of magnitude fewer trainable parameters than deep ensembles with similar performance (Guilhoto et al., 2024).

2.3 Composite Function Surrogates

In settings where $f(\bm{x}) \in \mathcal{Y} = L^2(\Omega_y)$ 5 decomposes as a composition $f(\bm{x}) \in \mathcal{Y} = L^2(\Omega_y)$ 6, where $f(\bm{x}) \in \mathcal{Y} = L^2(\Omega_y)$ 7 is unknown and expensive while $f(\bm{x}) \in \mathcal{Y} = L^2(\Omega_y)$ 8 is known and cheap, the surrogate models $f(\bm{x}) \in \mathcal{Y} = L^2(\Omega_y)$ 9 directly and explicitly propagates uncertainty through $\Omega_x, \Omega_y \subset \mathbb{R}^d$ 0 (Guilhoto et al., 2024, Jain et al., 2023).

3. Acquisition Function Construction in Function Space

FFBO adopts scalarization strategies to reduce function-valued outputs to scalar acquisition criteria:

Operator-Based Scalarization: Using a weighting function $\Omega_x, \Omega_y \subset \mathbb{R}^d$ 1, define

$\Omega_x, \Omega_y \subset \mathbb{R}^d$ 2

which inherits a scalar-valued GP structure from the FFGP model; this enables use of established acquisition strategies (Huang et al., 16 Nov 2025).

Probabilistic Acquisition Functions: Expected Improvement (EI), Leaky EI (L-EI), and Upper Confidence Bound (UCB) are extended to function-on-function surrogates by propagating the stochastic surrogate's uncertainty through the scalarization. NEON-based approaches estimate acquisition function values via Monte Carlo sampling over epistemic indices $\Omega_x, \Omega_y \subset \mathbb{R}^d$ 3 (Guilhoto et al., 2024).
Composite EI and UCB: In the composition setting, composite EI (cEI) and composite UCB (cUCB) acquisition functions are constructed by propagating the GP uncertainty of intermediate functions through the known composition structure. Empirical mean and variance are combined to derive acquisition values (Jain et al., 2023).

4. Optimization over Function Spaces

Optimization in FFBO operates over infinite-dimensional domains. The primary optimization procedure is functional gradient ascent (FGA), relying on Fréchet derivatives computed in the Banach or Hilbert space of functions (Huang et al., 16 Nov 2025).

Algorithmic steps in FGA:

Initialize function input $\Omega_x, \Omega_y \subset \mathbb{R}^d$ 4.
Iteratively update via

$\Omega_x, \Omega_y \subset \mathbb{R}^d$ 5

employing the analytic gradients of the acquisition function with respect to the input functions.

Terminate after a fixed number of iterations or upon convergence; return the maximizing $\Omega_x, \Omega_y \subset \mathbb{R}^d$ 6 as the next query point.

In NEON-based FFBO, optimization is typically performed via multi-start L-BFGS-B or hybrid local/global strategies, with each candidate $\Omega_x, \Omega_y \subset \mathbb{R}^d$ 7 evaluated across multiple stochastic samples of $\Omega_x, \Omega_y \subset \mathbb{R}^d$ 8 for reliable acquisition estimation (Guilhoto et al., 2024).

5. Theoretical Analysis and Regret Bounds

FFBO admits sublinear cumulative regret under mild assumptions: $\Omega_x, \Omega_y \subset \mathbb{R}^d$ 9 where $L_\phi f(\bm{x}) = \int_{\Omega_y} \phi(t)\, f(\bm{x})(t)\,dt,$ 0, $L_\phi f(\bm{x}) = \int_{\Omega_y} \phi(t)\, f(\bm{x})(t)\,dt,$ 1, and $L_\phi f(\bm{x}) = \int_{\Omega_y} \phi(t)\, f(\bm{x})(t)\,dt,$ 2 are problem-dependent constants, and $L_\phi f(\bm{x}) = \int_{\Omega_y} \phi(t)\, f(\bm{x})(t)\,dt,$ 3 represents the information gain after $L_\phi f(\bm{x}) = \int_{\Omega_y} \phi(t)\, f(\bm{x})(t)\,dt,$ 4 queries (Huang et al., 16 Nov 2025). The concentration of the FFGP posterior covariance operator is ensured (trace-class), and the finite-mode posterior approximations converge in $L_\phi f(\bm{x}) = \int_{\Omega_y} \phi(t)\, f(\bm{x})(t)\,dt,$ 5 as the number of eigenmodes $L_\phi f(\bm{x}) = \int_{\Omega_y} \phi(t)\, f(\bm{x})(t)\,dt,$ 6.

A regularity condition is that the function $L_\phi f(\bm{x}) = \int_{\Omega_y} \phi(t)\, f(\bm{x})(t)\,dt,$ 7 lies in an RKHS defined by the operator-valued kernel, with Gaussian noise scaling as $L_\phi f(\bm{x}) = \int_{\Omega_y} \phi(t)\, f(\bm{x})(t)\,dt,$ 8.

A theorem establishes the equivalence between L-EI and EI acquisition functions for bounded objectives, demonstrating that L-EI can be made arbitrarily close to EI by appropriate choice of the leaky slope $L_\phi f(\bm{x}) = \int_{\Omega_y} \phi(t)\, f(\bm{x})(t)\,dt,$ 9 (Guilhoto et al., 2024).

6. Empirical Performance and Practical Applications

Extensive synthetic and real-world experiments establish FFBO's effectiveness:

Synthetic Benchmarks: FFBO achieves the lowest simple regret and fastest convergence versus alternative methods (FIBO, FOBO, MTBO, scalarized BO) on one-dimensional function input/output tasks, outperforming models based on functional principal components or fixed-dimensional parameterizations (Huang et al., 16 Nov 2025).
Operator Learning Tasks: NEON-based FFBO converges 2–5× faster than standard GP-BO or deep ensemble methods on diffusion-inverse and PDE problems, achieving similar or better final objectives (Guilhoto et al., 2024).
Engineering Design: In a 3D-printed aortic-valve case study, FFBO exhibits faster regret reduction and improved solutions compared to baselines.
Telecommunications and Optics: NEON-FFBO, with substantially fewer parameters, outperforms deep ensemble surrogates on optical interferometer alignment and cell-tower coverage tasks (Guilhoto et al., 2024).
Dynamic Pricing: FFBO for function compositions, using independent GPs per constituent, outperforms vanilla BO and multi-output GP methods in revenue management applications by leveraging decomposition and scalarization (Jain et al., 2023).

Parameter and Computational Considerations

FFBO Variant	Model Complexity	Acquisition Estimation	Param-Efficiency
FFGP (Huang et al., 16 Nov 2025)	Block-operator GP	Scalarization + UCB/EI	High (trunc. eigenbasis)
NEON (Guilhoto et al., 2024)	Neural operator + EpiNet	Monte Carlo over ensembles	10–100× fewer than DE
Comp. GP (Jain et al., 2023)	Ind. GPs per component	Marginal → composite via h	O(M n³) per iteration

This table summarizes differences in model structure and resource requirements; "DE" denotes deep ensembles.

7. Strengths, Limitations, and Research Directions

FFBO's strengths include:

Direct infinite-dimensional surrogate modeling without ad hoc discretization or FPCA truncation.
Operator-valued kernels and neural operator frameworks enabling rich modeling of input–output dependencies in function spaces.
Principled scalarization strategies supporting theoretical acquisition guarantees.
Demonstrated empirical advantages in convergence rate and sample efficiency across domains.

Limitations include:

Computational cost scales with eigenmode truncation ( $\phi \in L^2(\Omega_y)$ 0), sample size ( $\phi \in L^2(\Omega_y)$ 1), and the need for high-precision integrations in $\phi \in L^2(\Omega_y)$ 2.
Selection of operator $\phi \in L^2(\Omega_y)$ 3 and weighting function $\phi \in L^2(\Omega_y)$ 4 often requires domain expert input.
Extensions under exploration include nonseparable kernels, multi-objective FFBO, batch query schemes, and automated estimation of scalarization weights (Huang et al., 16 Nov 2025).

A plausible implication is that advances in scalable operator learning and efficient functional optimization could further enhance the applicability of FFBO to scientific and engineering problems involving complex, structured functional relationships.

Markdown Report Issue Upgrade to Chat

References (3)

Function-on-Function Bayesian Optimization (2025)

Composite Bayesian Optimization In Function Spaces Using NEON -- Neural Epistemic Operator Networks (2024)

Bayesian Optimization for Function Compositions with Applications to Dynamic Pricing (2023)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Function-on-Function Bayesian Optimization (FFBO).