Papers
Topics
Authors
Recent
Search
2000 character limit reached

Function-on-Function Bayesian Optimization

Updated 18 November 2025
  • Function-on-Function Bayesian Optimization is a framework that models both inputs and outputs as functions in infinite-dimensional spaces.
  • It employs novel surrogate models like function-on-function Gaussian processes and neural operator networks to efficiently quantify uncertainty.
  • FFBO uses scalarization and functional gradient methods to optimize expensive black-box mappings in applications such as PDE simulations and engineering design.

Function-on-Function Bayesian Optimization (FFBO) refers to a class of Bayesian optimization methods in which both the inputs and outputs of the target function are functions defined on continuous domains, often infinite-dimensional. FFBO generalizes classical Bayesian optimization beyond finite-dimensional (vector) spaces to settings arising in advanced scientific and engineering systems featuring functional parameters, controls, or outputs, such as PDE-based simulators, functional mechanical designs, and scientific operator learning. FFBO introduces new surrogate modeling, acquisition, and optimization strategies to address the mathematical and algorithmic challenges inherent to infinite-dimensional function spaces (Huang et al., 16 Nov 2025, Guilhoto et al., 2024, Jain et al., 2023).

1. Problem Formulation in FFBO

The FFBO framework addresses the global optimization of a mapping

f:XpY,f: \mathcal{X}^p \to \mathcal{Y},

where both the input x=(x1,,xp)Xp[L2(Ωx)]p\bm{x} = (x_1, \ldots, x_p) \in \mathcal{X}^p \subset [L^2(\Omega_x)]^p and the output f(x)Y=L2(Ωy)f(\bm{x}) \in \mathcal{Y} = L^2(\Omega_y) are functions over compact domains Ωx,ΩyRd\Omega_x, \Omega_y \subset \mathbb{R}^d. The typical objective is to maximize a user-specified linear functional of the output,

Lϕf(x)=Ωyϕ(t)f(x)(t)dt,L_\phi f(\bm{x}) = \int_{\Omega_y} \phi(t)\, f(\bm{x})(t)\,dt,

with ϕL2(Ωy)\phi \in L^2(\Omega_y) acting as a weighting function (e.g., Dirac delta, uniform, or a smoothing kernel). The black-box mapping ff is expensive to evaluate, and observations are noisy realizations in function space: yi=f(xi)+εi,εiN(0,τ2IY),i=1,...,n.y_i = f(\bm{x}_i) + \varepsilon_i, \qquad \varepsilon_i \sim \mathcal{N}(0, \tau^2 I_\mathcal{Y}), \quad i=1, ..., n. The optimization goal is to find

x=argmaxxXpLϕf(x),\bm{x}^\star = \arg\max_{\bm{x} \in \mathcal{X}^p} L_\phi f(\bm{x}),

with as few expensive queries as possible, leveraging structure in ff and prior information (Huang et al., 16 Nov 2025).

2. Surrogate Modeling Approaches

2.1 Function-on-Function Gaussian Processes

The function-on-function Gaussian Process (FFGP) surrogate models the mapping x=(x1,,xp)Xp[L2(Ωx)]p\bm{x} = (x_1, \ldots, x_p) \in \mathcal{X}^p \subset [L^2(\Omega_x)]^p0 directly in the function space. The FFGP places a Gaussian process prior: x=(x1,,xp)Xp[L2(Ωx)]p\bm{x} = (x_1, \ldots, x_p) \in \mathcal{X}^p \subset [L^2(\Omega_x)]^p1 where x=(x1,,xp)Xp[L2(Ωx)]p\bm{x} = (x_1, \ldots, x_p) \in \mathcal{X}^p \subset [L^2(\Omega_x)]^p2 is the mean and x=(x1,,xp)Xp[L2(Ωx)]p\bm{x} = (x_1, \ldots, x_p) \in \mathcal{X}^p \subset [L^2(\Omega_x)]^p3 is an operator-valued kernel. The standard choice is separable: x=(x1,,xp)Xp[L2(Ωx)]p\bm{x} = (x_1, \ldots, x_p) \in \mathcal{X}^p \subset [L^2(\Omega_x)]^p4 with x=(x1,,xp)Xp[L2(Ωx)]p\bm{x} = (x_1, \ldots, x_p) \in \mathcal{X}^p \subset [L^2(\Omega_x)]^p5 a positive-definite scalar kernel over function inputs (using x=(x1,,xp)Xp[L2(Ωx)]p\bm{x} = (x_1, \ldots, x_p) \in \mathcal{X}^p \subset [L^2(\Omega_x)]^p6 distance), and x=(x1,,xp)Xp[L2(Ωx)]p\bm{x} = (x_1, \ldots, x_p) \in \mathcal{X}^p \subset [L^2(\Omega_x)]^p7 a self-adjoint, positive Hilbert–Schmidt operator (e.g., integral operator with kernel x=(x1,,xp)Xp[L2(Ωx)]p\bm{x} = (x_1, \ldots, x_p) \in \mathcal{X}^p \subset [L^2(\Omega_x)]^p8) (Huang et al., 16 Nov 2025).

Posterior inference yields mean and covariance in x=(x1,,xp)Xp[L2(Ωx)]p\bm{x} = (x_1, \ldots, x_p) \in \mathcal{X}^p \subset [L^2(\Omega_x)]^p9, with explicit expressions for practical computation using truncated eigenbasis decompositions.

2.2 Neural Operator Surrogates

An alternative surrogate uses operator-learning neural networks such as NEON (Neural Epistemic Operator Networks) (Guilhoto et al., 2024). NEON comprises:

  • An encoder-decoder backbone: encoder f(x)Y=L2(Ωy)f(\bm{x}) \in \mathcal{Y} = L^2(\Omega_y)0; decoder f(x)Y=L2(Ωy)f(\bm{x}) \in \mathcal{Y} = L^2(\Omega_y)1, for modeling the deterministic operator f(x)Y=L2(Ωy)f(\bm{x}) \in \mathcal{Y} = L^2(\Omega_y)2.
  • An epistemic uncertainty quantifier ("EpiNet"): a small network f(x)Y=L2(Ωy)f(\bm{x}) \in \mathcal{Y} = L^2(\Omega_y)3 indexed by random variable f(x)Y=L2(Ωy)f(\bm{x}) \in \mathcal{Y} = L^2(\Omega_y)4, providing an ensemble of predictions for efficient uncertainty quantification.

The NEON surrogate allows efficient parameterization and training, with ensembles generated via the EpiNet, and achieves parameter efficiency—typically requiring 1–2 orders of magnitude fewer trainable parameters than deep ensembles with similar performance (Guilhoto et al., 2024).

2.3 Composite Function Surrogates

In settings where f(x)Y=L2(Ωy)f(\bm{x}) \in \mathcal{Y} = L^2(\Omega_y)5 decomposes as a composition f(x)Y=L2(Ωy)f(\bm{x}) \in \mathcal{Y} = L^2(\Omega_y)6, where f(x)Y=L2(Ωy)f(\bm{x}) \in \mathcal{Y} = L^2(\Omega_y)7 is unknown and expensive while f(x)Y=L2(Ωy)f(\bm{x}) \in \mathcal{Y} = L^2(\Omega_y)8 is known and cheap, the surrogate models f(x)Y=L2(Ωy)f(\bm{x}) \in \mathcal{Y} = L^2(\Omega_y)9 directly and explicitly propagates uncertainty through Ωx,ΩyRd\Omega_x, \Omega_y \subset \mathbb{R}^d0 (Guilhoto et al., 2024, Jain et al., 2023).

3. Acquisition Function Construction in Function Space

FFBO adopts scalarization strategies to reduce function-valued outputs to scalar acquisition criteria:

  • Operator-Based Scalarization: Using a weighting function Ωx,ΩyRd\Omega_x, \Omega_y \subset \mathbb{R}^d1, define

Ωx,ΩyRd\Omega_x, \Omega_y \subset \mathbb{R}^d2

which inherits a scalar-valued GP structure from the FFGP model; this enables use of established acquisition strategies (Huang et al., 16 Nov 2025).

  • Probabilistic Acquisition Functions: Expected Improvement (EI), Leaky EI (L-EI), and Upper Confidence Bound (UCB) are extended to function-on-function surrogates by propagating the stochastic surrogate's uncertainty through the scalarization. NEON-based approaches estimate acquisition function values via Monte Carlo sampling over epistemic indices Ωx,ΩyRd\Omega_x, \Omega_y \subset \mathbb{R}^d3 (Guilhoto et al., 2024).
  • Composite EI and UCB: In the composition setting, composite EI (cEI) and composite UCB (cUCB) acquisition functions are constructed by propagating the GP uncertainty of intermediate functions through the known composition structure. Empirical mean and variance are combined to derive acquisition values (Jain et al., 2023).

4. Optimization over Function Spaces

Optimization in FFBO operates over infinite-dimensional domains. The primary optimization procedure is functional gradient ascent (FGA), relying on Fréchet derivatives computed in the Banach or Hilbert space of functions (Huang et al., 16 Nov 2025).

Algorithmic steps in FGA:

  • Initialize function input Ωx,ΩyRd\Omega_x, \Omega_y \subset \mathbb{R}^d4.
  • Iteratively update via

Ωx,ΩyRd\Omega_x, \Omega_y \subset \mathbb{R}^d5

employing the analytic gradients of the acquisition function with respect to the input functions.

  • Terminate after a fixed number of iterations or upon convergence; return the maximizing Ωx,ΩyRd\Omega_x, \Omega_y \subset \mathbb{R}^d6 as the next query point.

In NEON-based FFBO, optimization is typically performed via multi-start L-BFGS-B or hybrid local/global strategies, with each candidate Ωx,ΩyRd\Omega_x, \Omega_y \subset \mathbb{R}^d7 evaluated across multiple stochastic samples of Ωx,ΩyRd\Omega_x, \Omega_y \subset \mathbb{R}^d8 for reliable acquisition estimation (Guilhoto et al., 2024).

5. Theoretical Analysis and Regret Bounds

FFBO admits sublinear cumulative regret under mild assumptions: Ωx,ΩyRd\Omega_x, \Omega_y \subset \mathbb{R}^d9 where Lϕf(x)=Ωyϕ(t)f(x)(t)dt,L_\phi f(\bm{x}) = \int_{\Omega_y} \phi(t)\, f(\bm{x})(t)\,dt,0, Lϕf(x)=Ωyϕ(t)f(x)(t)dt,L_\phi f(\bm{x}) = \int_{\Omega_y} \phi(t)\, f(\bm{x})(t)\,dt,1, and Lϕf(x)=Ωyϕ(t)f(x)(t)dt,L_\phi f(\bm{x}) = \int_{\Omega_y} \phi(t)\, f(\bm{x})(t)\,dt,2 are problem-dependent constants, and Lϕf(x)=Ωyϕ(t)f(x)(t)dt,L_\phi f(\bm{x}) = \int_{\Omega_y} \phi(t)\, f(\bm{x})(t)\,dt,3 represents the information gain after Lϕf(x)=Ωyϕ(t)f(x)(t)dt,L_\phi f(\bm{x}) = \int_{\Omega_y} \phi(t)\, f(\bm{x})(t)\,dt,4 queries (Huang et al., 16 Nov 2025). The concentration of the FFGP posterior covariance operator is ensured (trace-class), and the finite-mode posterior approximations converge in Lϕf(x)=Ωyϕ(t)f(x)(t)dt,L_\phi f(\bm{x}) = \int_{\Omega_y} \phi(t)\, f(\bm{x})(t)\,dt,5 as the number of eigenmodes Lϕf(x)=Ωyϕ(t)f(x)(t)dt,L_\phi f(\bm{x}) = \int_{\Omega_y} \phi(t)\, f(\bm{x})(t)\,dt,6.

A regularity condition is that the function Lϕf(x)=Ωyϕ(t)f(x)(t)dt,L_\phi f(\bm{x}) = \int_{\Omega_y} \phi(t)\, f(\bm{x})(t)\,dt,7 lies in an RKHS defined by the operator-valued kernel, with Gaussian noise scaling as Lϕf(x)=Ωyϕ(t)f(x)(t)dt,L_\phi f(\bm{x}) = \int_{\Omega_y} \phi(t)\, f(\bm{x})(t)\,dt,8.

A theorem establishes the equivalence between L-EI and EI acquisition functions for bounded objectives, demonstrating that L-EI can be made arbitrarily close to EI by appropriate choice of the leaky slope Lϕf(x)=Ωyϕ(t)f(x)(t)dt,L_\phi f(\bm{x}) = \int_{\Omega_y} \phi(t)\, f(\bm{x})(t)\,dt,9 (Guilhoto et al., 2024).

6. Empirical Performance and Practical Applications

Extensive synthetic and real-world experiments establish FFBO's effectiveness:

  • Synthetic Benchmarks: FFBO achieves the lowest simple regret and fastest convergence versus alternative methods (FIBO, FOBO, MTBO, scalarized BO) on one-dimensional function input/output tasks, outperforming models based on functional principal components or fixed-dimensional parameterizations (Huang et al., 16 Nov 2025).
  • Operator Learning Tasks: NEON-based FFBO converges 2–5× faster than standard GP-BO or deep ensemble methods on diffusion-inverse and PDE problems, achieving similar or better final objectives (Guilhoto et al., 2024).
  • Engineering Design: In a 3D-printed aortic-valve case study, FFBO exhibits faster regret reduction and improved solutions compared to baselines.
  • Telecommunications and Optics: NEON-FFBO, with substantially fewer parameters, outperforms deep ensemble surrogates on optical interferometer alignment and cell-tower coverage tasks (Guilhoto et al., 2024).
  • Dynamic Pricing: FFBO for function compositions, using independent GPs per constituent, outperforms vanilla BO and multi-output GP methods in revenue management applications by leveraging decomposition and scalarization (Jain et al., 2023).

Parameter and Computational Considerations

FFBO Variant Model Complexity Acquisition Estimation Param-Efficiency
FFGP (Huang et al., 16 Nov 2025) Block-operator GP Scalarization + UCB/EI High (trunc. eigenbasis)
NEON (Guilhoto et al., 2024) Neural operator + EpiNet Monte Carlo over ensembles 10–100× fewer than DE
Comp. GP (Jain et al., 2023) Ind. GPs per component Marginal → composite via h O(M n³) per iteration

This table summarizes differences in model structure and resource requirements; "DE" denotes deep ensembles.

7. Strengths, Limitations, and Research Directions

FFBO's strengths include:

  • Direct infinite-dimensional surrogate modeling without ad hoc discretization or FPCA truncation.
  • Operator-valued kernels and neural operator frameworks enabling rich modeling of input–output dependencies in function spaces.
  • Principled scalarization strategies supporting theoretical acquisition guarantees.
  • Demonstrated empirical advantages in convergence rate and sample efficiency across domains.

Limitations include:

  • Computational cost scales with eigenmode truncation (ϕL2(Ωy)\phi \in L^2(\Omega_y)0), sample size (ϕL2(Ωy)\phi \in L^2(\Omega_y)1), and the need for high-precision integrations in ϕL2(Ωy)\phi \in L^2(\Omega_y)2.
  • Selection of operator ϕL2(Ωy)\phi \in L^2(\Omega_y)3 and weighting function ϕL2(Ωy)\phi \in L^2(\Omega_y)4 often requires domain expert input.
  • Extensions under exploration include nonseparable kernels, multi-objective FFBO, batch query schemes, and automated estimation of scalarization weights (Huang et al., 16 Nov 2025).

A plausible implication is that advances in scalable operator learning and efficient functional optimization could further enhance the applicability of FFBO to scientific and engineering problems involving complex, structured functional relationships.

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Function-on-Function Bayesian Optimization (FFBO).