Multidimensional Gaussian Process Regression

Updated 22 January 2026

Multidimensional Gaussian Process Regression extends traditional GP methods to handle vector, matrix, or tensor outputs with structured dependencies.
It employs techniques like additive models, projection pursuit, and sparse methods to manage high-dimensional inputs and overcome computational challenges.
Applications span multi-task learning, simulator emulation, and spatiotemporal modeling, offering scalable and accurate predictive tools.

Multidimensional Gaussian Process Regression encompasses a spectrum of models and computational techniques for addressing regression problems where either the inputs, outputs, or both possess multidimensional structure. This includes modeling vector or tensor-valued outputs, exploiting structured dependencies in high-dimensional inputs, and leveraging efficient inference for large-scale or structured datasets. The category unifies classical scalar-valued Gaussian Process (GP) regression, tensor-variate GPs, multi-output models, additive/random projection schemes, projection pursuit formulations, sparse/inducing-point and Kronecker methods, as well as models adapted to distributional or manifold-structured data.

1. Core Principles and Models

Multidimensional GP regression generalizes the standard GP framework, which assumes outputs $y_i=f(x_i)+\epsilon_i$ with $f$ drawn from a zero-mean GP and $\epsilon_i\sim \mathcal{N}(0,\sigma^2)$ , to settings with vector or tensor outputs and/or high-dimensional input spaces.

Multi-Output Gaussian Processes

Models such as the Gaussian Process Regression Network (GPRN) (Wilson et al., 2011) address $p$ -dimensional output $y(x)\in\mathbb{R}^p$ by positing $q$ latent “node” GPs $f_j(x)$ and input-dependent “mixing” GPs $W_{ij}(x)$ . The generative structure is: $y(x) = W(x) f(x) + \text{structured noise},$ with independent GP priors on $f_j$ and $W_{ij}$ , supporting input-varying signal/noise covariance and heavy-tailed predictive distributions.

Autoregressive models (GPAR) (Requeima et al., 2018) decompose the joint distribution over $D$ outputs into a product of conditionals, modeling each as a GP with input augmented by preceding outputs: $p(y_1,\ldots, y_D|x) = \prod_{i=1}^D p(y_i|x, y_{1:i-1}),$ learning nonlinear, input-varying output dependencies.

Tensor-variate GPs (TvGPs) (Semochkina et al., 14 Feb 2025) generalize to outputs $F(x)\in \mathbb{R}^{r_1\times r_2\times \cdots \times r_m}$ , with a Kronecker-structured covariance: $\mathrm{Cov}[\mathrm{vec}\,F(x),\,\mathrm{vec}\,F(x')] = \kappa(x,x')\,\Sigma_1\otimes\cdots\otimes\Sigma_m.$

High-Dimensional Input Strategies

Standard GP regression suffers from poor scaling with input dimension due to the curse of dimensionality and computational limitations (typically $O(N^3)$ time for $N$ data). Several approaches mitigate these problems:

Additive/Backfitting Models: Assume additivity, $f(x) = \sum_{d=1}^D f_d(x_d)$ , reducing complexity to a sum of 1D GPs (Gilboa et al., 2012).
Projection Pursuit GP Regression (PPGPR): Model $f(x)\approx \sum_{j=1}^M g_j(a_j^\top x)$ with $M$ learned projections $a_j$ and univariate GPs $g_j$ , enabling higher expressivity at $O(MN)$ complexity (Gilboa et al., 2012, Chen et al., 2020).
Randomly Projected Additive GPs: Sum ("Editor's term": RPA-GP) $J$ GPs over random (or optimized) 1D projections, approximating full-dimensional kernels with few subkernels, enabling $O(Jn)$ scaling (Delbridge et al., 2019).
Sparse/Inducing-Point Methods: Employ $M\ll N$ variables to obtain $O(NM^2)$ scaling, with theoretical guarantees that $M=O(\log^D N)$ suffices for SE kernels and Gaussian inputs (Burt et al., 2019).
Exact Kronecker Solvers: For grid-structured inputs and factorized kernels, exploit Kronecker algebra to provide $O(N K n)$ complexity (where $K$ is input factors, $n$ per factor) (Belyaev et al., 2014).

2. Inference and Computational Strategies

Efficient inference in multidimensional GP regression leverages both model structure and numerics.

Tensor and Kronecker Methods

For factorial input grids and separable kernels: $K_f = K_1\otimes K_2\otimes \cdots \otimes K_K,$ allowing eigendecomposition and inversion at $O(\sum_k n_k^3 + N\sum_k n_k)$ (Belyaev et al., 2014). For tensor-variate outputs with Kronecker coregionalization,

$\mathrm{Cov}[\mathrm{vec}F] = K\otimes \Sigma_1\otimes\cdots\otimes\Sigma_m,$

enabling posterior mean and covariance computation using mode-1 tensor–matrix products and Kronecker algebra (Semochkina et al., 14 Feb 2025).

Sparse Variational Approaches

Given $M$ inducing points, variational posteriors match exact GP posteriors in the subspace, with the approximation quality controlled by spectral decay. For the squared exponential kernel with Gaussian inputs, the $M$ needed for vanishing process KL grows polylogarithmically: $M=\mathcal{O}(\log^D N)$ (Burt et al., 2019).

Additive, Random, and Projection Pursuit

Additive and projection models scale as $O(DN)$ and $O(MN)$ , respectively (Gilboa et al., 2012). RPA-GP and DPA-GP achieve near-parity with full GP performance using $J\sim20$ projections, with empirical convergence of the additive kernel to RBF/IMQ (Delbridge et al., 2019). PPGPR, with "dimension expansion" ( $M\geq d$ ), substantially improves accuracy for non-additive, high-dimensional targets (Chen et al., 2020).

Table: Complexity scaling of representative strategies

Model/strategy	Complexity	Dimensionality regime
Full GP	$O(N^3)$	low/moderate $N$
Sparse GP (inducing pts)	$O(NM^2)$	large $N$ , moderate $M$
Additive GP (backfit)	$O(DN)$	high $D$ , functional $f$ separable
RPA-GP/DPA-GP	$O(Jn)$	high $d$ , additivity/low-d proj.
Kronecker (input grid)	$O(N K n)$	large $N$ on grid, $K$ axes
Tensor-variate GP	$O(n^3+\sum_z r_z^3)$	matrix/tensor outputs

3. Structured Output: Tensor-Variate and Multi-Output Emulation

When the output space is structured (vector, matrix, or higher-order tensor-valued), appropriate covariance parameterization is required for capturing inter-output dependencies.

Tensor-Variate Models

The tensor-variate GP (TvGP) (Semochkina et al., 14 Feb 2025) assigns a separable Kronecker product covariance across each output mode: $\mathrm{Cov}[\mathrm{vec}F] = K \otimes (\Sigma_1\otimes \Sigma_2 \cdots).$ Special cases include the Outer-Product Emulator (OPE) and Parallel-Partial Emulator (PPE) for 2D outputs.

OPE: Full Kronecker covariance over outputs, with mean and residuals exploiting separable structure, enabling strong information sharing. Advantageous for smooth, highly correlated outputs.
PPE: Independent GPs for each output index, no inter-output residual correlation, scaling to large output domains or unstructured outputs.

Empirical findings (Semochkina et al., 14 Feb 2025):

OPE yields inferior calibration when output heteroskedasticity is present but provides lower RMSPE when data is scarce and outputs are highly correlated across indices.
PPE achieves better MASPE calibration and is preferable with large numbers of uncorrelated outputs.

Gaussian Process Regression Networks

GPRN (Wilson et al., 2011) extends the GP framework to multi-output regression and volatility modeling, with each output as a sum over input-dependent mixtures of latent GPs. This model can capture time-varying covariances, input-dependent correlations, and nonlinear noise models.

4. Extensions: Distributional Inputs, Manifolds, and Adaptive GPs

GPs have been extended to non-vectorial and adaptive settings.

Distributional Inputs

For regression with probability measure-valued inputs, GPs can be constructed via optimal transport embeddings to a Wasserstein barycenter, reducing the problem to radial positive definite kernels in $L^2$ Hilbert spaces (Bachoc et al., 2018). For Gaussian input distributions, kernels can be explicitly characterized in terms of covariance differences.

Manifold and Locally-Adaptive Regression

Implicit Manifold GP regression (Fichera et al., 2023) proposes leveraging the manifold hypothesis by inferring implicit low-dimensional structure from data directly, enabling scaling to extremely high ambient dimensionality and convergence to Matérn GP on the learned manifold.

Adaptive nonstationary GP models (Mathews et al., 2021) represent spatial and temporal correlations via local (input-dependent) length-scales and adaptivity, as in the nonstationary Matérn kernel, with applications to multivariate spatiotemporal plasma data and uncertainty quantification.

5. Empirical Performance and Practical Guidelines

Empirical comparisons across multidimensional GP regression models demonstrate:

GPRN achieves state-of-the-art performance in multi-output (up to 1000 dimensions) and multivariate volatility tasks, dominating LMC, SLFM, and competing with MGARCH models (Wilson et al., 2011).
RPA-GP/DPA-GP rapidly converge to full GP accuracy in high $d$ with only tens of 1D projections, outperforming ARD GPs when $n\ll d$ (e.g., image regression at $d=4096$ ) (Delbridge et al., 2019).
PPGPR with dimension expansion ( $M\gtrsim 4d$ ) matches or outperforms isotropic and product-structure GPs in complex benchmark functions and engineering datasets (Chen et al., 2020).
Tensor-variate GP (OPE) is favored when outputs have metric structure and data is limited. If outputs are numerous and uncorrelated, PPE is computationally tractable and better calibrated (Semochkina et al., 14 Feb 2025).

Guidelines for model choice (Semochkina et al., 14 Feb 2025, Delbridge et al., 2019, Gilboa et al., 2012):

Use OPE or Kronecker methods for grid-structured input/output with smooth, correlated outputs.
Employ sparse/inducing-point or random projection strategies in high $D$ or for very large $N$ .
Choose additive or PPGPR models when full interactions are infeasible but significant structure can be captured by sum/projection representations.
For manifold-structured or distributional inputs, utilize GP variants with appropriate embeddings or data-driven latent structure inference.

6. Theoretical Guarantees and Limitations

Sparse GP bounds: For the ARD squared exponential kernel with Gaussian inputs, KL divergence to the exact GP posterior can be made arbitrarily small with $M=\mathcal{O}(\log^D N)$ inducing points (Burt et al., 2019).
Random projection GPs: Approximate original high-dimensional kernels to arbitrary accuracy as the number of projections $J\to\infty$ ; convergence rate is $O(J^{-1/2})$ under random projections (Delbridge et al., 2019).
Manifold GPs: Implicit manifold approaches converge to classic GP regression on the underlying data manifold, improving calibration and scalability (Fichera et al., 2023) (subject to model details).
Additive/projection methods: Projection pursuit and additive decomposition models are limited when the target is highly non-additive and $M$ is insufficient, or when learned directions are non-identifiable (Chen et al., 2020).

7. Applications and Benchmark Domains

Key applications of multidimensional GP regression include multi-task learning, emulation of computational simulators (e.g., spatio-temporal epidemiology, environmental modeling) (Semochkina et al., 14 Feb 2025), gene expression time series, geostatistics, volatility estimation (Wilson et al., 2011), and high-dimensional physics (e.g., plasma analysis (Mathews et al., 2021), molecular PES (Manzhos et al., 2021)). Performance is quantified via standardized MSE, log-likelihood, MAE, MASPE, RMSPE, and MGES metrics.

Table: Empirical accuracy for OPE vs. PPE (n=20 and n=50, influenza simulation) (Semochkina et al., 14 Feb 2025)

Metric	OPE n=20	PPE n=20	OPE n=50	PPE n=50
MASPE ≈1	2.758	0.736	2.142	0.664
RMSPE ↓	1.107	1.424	0.632	1.080
MGES ↑	-4.586e3	-0.270e3	-2.692e3	-0.037e3

Interpretation: OPE yields more accurate but less well-calibrated predictions for small samples. PPE improves calibration with a sufficiently large training set or unstructured outputs.

Multidimensional Gaussian Process Regression encompasses a comprehensive methodological and computational toolkit for high-dimensional, multi-output, and structured-input regression scenarios. By combining model-driven structure (additivity, projections, tensorization) with numerically scalable inference (sparse, Kronecker, variational), these approaches condense the flexibility of GPs to a range of practically and theoretically challenging regimes across domains.