Papers
Topics
Authors
Recent
Search
2000 character limit reached

Multioutput Gaussian Processes

Updated 2 January 2026
  • Multioutput Gaussian Processes are nonparametric frameworks that model vector-valued functions by jointly capturing correlations among outputs in tensor decomposition.
  • They employ hierarchical shrinkage priors to enable automatic rank selection and prune inactive components for efficient modeling.
  • Variational inference with closed-form updates ensures scalable learning and robust performance in tensor completion and factorization tasks.

A multioutput Gaussian process (MOGP) is a nonparametric probabilistic framework for modeling vector-valued functions, allowing flexible and joint treatment of correlated outputs or tensor-valued signals over continuous index sets. Within functional tensor decomposition and completion, MOGPs serve as priors over latent factor functions, encoding both intra-factor smoothness and inter-factor dependencies, supporting scalable inference, automatic rank selection, and functional universality.

1. Mathematical Foundations of Multioutput GPs in Tensor Decomposition

Consider a KK-mode functional tensor Y(x(1),,x(K))\mathcal{Y}(x^{(1)}, \dots, x^{(K)}), defined over X(1)××X(K)\mathcal{X}^{(1)} \times \cdots \times \mathcal{X}^{(K)}, each X(k)R\mathcal{X}^{(k)} \subset \mathbb{R}. The canonical polyadic (CP)-type decomposition for the noise-free field is

xi=r=1Rk=1Kur(k)(ik)x_{\boldsymbol i} = \sum_{r=1}^{R} \prod_{k=1}^{K} u_r^{(k)}(i_k)

where each ur(k):X(k)Ru_r^{(k)} : \mathcal{X}^{(k)} \to \mathbb{R} is a latent function. For parsimonious, flexible modeling of U(k)()=[u1(k)(),,uR(k)()]\mathbf{U}^{(k)}(\cdot) = [u_1^{(k)}(\cdot), \ldots, u_R^{(k)}(\cdot)], an RR-output vector of functions, the multioutput Gaussian process prior is

U(k)()MGP(0,ςk(,),Γ1)\mathbf{U}^{(k)}(\cdot) \sim \mathcal{MGP}(\mathbf{0}, \varsigma_k(\cdot, \cdot), \Gamma^{-1})

with a positive definite kernel ςk\varsigma_k capturing smoothness over X(k)\mathcal{X}^{(k)}, and row covariance Γ1=diag(γ11,,γR1)\Gamma^{-1} = \mathrm{diag}(\gamma_1^{-1}, \ldots, \gamma_R^{-1}) controlling power across CP components. The observed data yny_n are then

yn{ur(k)},τN(r=1Rk=1Kur(k)(ikn),τ1)y_n \mid \{u_r^{(k)}\}, \tau \sim \mathcal{N}\left( \sum_{r=1}^R \prod_{k=1}^K u_r^{(k)}(i_k^{n}), \tau^{-1} \right)

Each MOGP thus defines a prior over multivariate, mode-specific latent factors, enabling the application of the full machinery of Gaussian process-based function learning for both discrete and continuous-domain tensor settings (Li et al., 25 Dec 2025).

2. Hierarchical Shrinkage Priors and Automatic Rank Determination

Extending basic MOGPs, a hierarchical Bayesian scheme is employed for rank learning. Discretizing U(k)()\mathbf{U}^{(k)}(\cdot) at grids Sk\mathcal{S}_k, let U(k)RNk×RU^{(k)} \in \mathbb{R}^{N_k \times R} be the sampled factor matrix. The prior is:

p({U(k)}k=1Kγ)=k=1KMN(U(k);0,Σk,Γ1)p\left(\{U^{(k)}\}_{k=1}^{K} | \boldsymbol{\gamma}\right) = \prod_{k=1}^{K} \mathcal{MN}\left(U^{(k)}; 0, \Sigma_k, \Gamma^{-1}\right)

where Σk\Sigma_k derives from ςk\varsigma_k. Componentwise, ur(k)N(0,γr1Σk)u_r^{(k)} \sim \mathcal{N}(0, \gamma_r^{-1} \Sigma_k), and the shrinkage parameter γr\gamma_r has a Gamma prior:

p(γr)=Gam(γrar,br)p(\gamma_r) = \text{Gam}(\gamma_r | a_r, b_r)

As posterior inference proceeds, some γr\gamma_r diverge, shrinking the corresponding rank-1 component towards zero; the effective rank is manifest as the number of “active” rr for which γr\gamma_r remains finite. This induces a variational form of sparse Bayesian learning over a superposition of MOGP terms—automatic rank selection is thereby achieved (Li et al., 25 Dec 2025).

3. Variational Inference and Efficient Algorithmic Realizations

Inference is performed via mean-field variational Bayes, with the Evidence Lower Bound (ELBO):

L(q)=Eq[lnp(Y,Θ)]Eq[lnq(Θ)]\mathcal{L}(q) = \mathbb{E}_{q}[\ln p(Y, \Theta)] - \mathbb{E}_q[\ln q(\Theta)]

where q(Θ)=q(τ)q(γ)k,rq(ur(k))q(\Theta) = q(\tau) q(\boldsymbol{\gamma}) \prod_{k,r} q(u_r^{(k)}), and blockwise coordinate ascent updates are derived for each factor. Crucially:

  • q(ur(k))=N(mr(k),Ψr(k))q(u_r^{(k)}) = \mathcal{N}(m_r^{(k)}, \Psi_r^{(k)}), with closed-form updates for (mr(k),Ψr(k))(m_r^{(k)}, \Psi_r^{(k)}) that involve expectations over Khatri–Rao products and GP kernel inverses.
  • q(γr)q(\gamma_r) and q(τ)q(\tau) remain conjugate Gamma distributions.

The dominant computational cost is inverting Nk×NkN_k \times N_k matrices per mode kk and factor rr, scaling as O(RmaxkNk3)\mathcal{O}(R \max_k N_k^3). Unused CP terms are pruned as their γr\gamma_r explode, yielding a practical reduction in complexity and improved convergence (Li et al., 25 Dec 2025).

4. Universality and Theoretical Properties

If each kernel ςk\varsigma_k defines a universal reproducing kernel Hilbert space (RKHS) on a compact domain, then the CP sum of products of MOGP mean functions can uniformly approximate any continuous function on that domain. Specifically, the RR-FBTC model has the universal approximation property: for any gC(Z)g \in C(\mathcal{Z}) and ϵ>0\epsilon > 0, there exist parameters such that the posterior mean

f(x)=r=1Rk=1Kuˉr(k)(xk)f(\boldsymbol{x}) = \sum_{r=1}^R \prod_{k=1}^K \bar u_r^{(k)}(x_k)

satisfies fg<ϵ\| f - g \|_{\infty} < \epsilon, provided RR is sufficiently large. This extends the universality of kernel methods to the tensor-valued, multioutput setting, supporting arbitrarily expressive models for continuous tensor signals (Li et al., 25 Dec 2025).

5. Empirical Performance and Benchmarking

RR-FBTC and related MOGP-based tensor decomposition frameworks demonstrate state-of-the-art results across synthetic and real-world data. Empirical evaluations include:

  • Synthetic tensors (e.g., 30×30×3030 \times 30 \times 30, various ranks), US-Temperature, 3D sound-speed, and image inpainting.
  • Key metrics: relative root square error (RRSE), root mean square error (RMSE), peak signal-to-noise ratio (PSNR), and structural similarity (SSIM).
  • RR-FBTC consistently achieves lower RRSE and RMSE than Bayesian CP, neural network CP, and single-output GP baselines, with superior rank recovery even at high noise and low observation rates.
  • The learned basis functions from RR-FBTC reflect physically meaningful patterns (e.g., latitude/longitude temperature variation) and are robust to the initial rank overparameterization.
  • Computationally, RR-FBTC exhibits competitive or superior run-times relative to continuous-CP neural methods and earlier functional-GP approaches (Li et al., 25 Dec 2025).

Other Bayesian tensor models generalize or complement MOGP-based approaches:

  • Global–local priors such as the Horseshoe (“one-group” priors) induce rank-sparse decomposition via heavy-tailed shrinkage on CP or TT components, achieving “tuning-free” model adaptation and strong finite-sample performance (Gilbert et al., 2019).
  • Bayesian tensor train (TT) factorization with Gaussian-product-Gamma hyperpriors supports automatic slicing-rank determination and scalable variational inference (Xu et al., 2020).
  • Hierarchical sparsity-inducing priors on non-CP representations, e.g., re-weighted Laplace or mixture-of-Gaussians models, provide mechanisms for modeling non-low-rank residual structure and outlier effects (Zhang et al., 2017).

These developments integrate MOGP machinery with probabilistic low-rank tensor learning, rank regularization, and flexible prior architectures.


References:

  • "When Bayesian Tensor Completion Meets Multioutput Gaussian Processes: Functional Universality and Rank Learning" (Li et al., 25 Dec 2025)
  • "Tuning Free Rank-Sparse Bayesian Matrix and Tensor Completion with Global-Local Priors" (Gilbert et al., 2019)
  • "Beyond Low Rank: A Data-Adaptive Tensor Completion Method" (Zhang et al., 2017)
  • "Tensor Train Factorization and Completion under Noisy Data with Prior Analysis and Rank Estimation" (Xu et al., 2020)
  • "Rank regularization and Bayesian inference for tensor completion and extrapolation" (Bazerque et al., 2013)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Multioutput Gaussian Processes.