Papers
Topics
Authors
Recent
Search
2000 character limit reached

Latent-Map Gaussian Process (LMGP)

Updated 9 February 2026
  • LMGP is a Gaussian Process extension that embeds qualitative variables into a low-dimensional latent space, unifying quantitative and categorical inputs.
  • It utilizes a joint kernel over augmented inputs and optimizes both latent embeddings and GP hyperparameters to achieve superior predictive accuracy.
  • Empirical studies show LMGP outperforms standard methods in multi-fidelity, calibration, and mixed-data modeling tasks.

Latent-Map Gaussian Process (LMGP) is a generalization of Gaussian process (GP) regression designed to seamlessly incorporate both quantitative and qualitative (categorical) inputs by embedding the latter into a learned low-dimensional latent space. LMGP provides a unified, likelihood-based framework that allows standard GP methodologies to be extended naturally to problems involving mixed data types, data fusion, multi-fidelity modeling, and calibration. Central to LMGP is the principle that each categorical level or combination is mapped to a continuous vector in latent space, and the kernel function acts jointly on the augmented numerical-plus-latent input. This approach unifies the treatment of quantitative and qualitative inputs, automatically learns category relationships, and offers interpretability and efficiency advantages over conventional multiresponse GP or specialized covariance approaches.

1. Latent Map Representation of Qualitative Inputs

LMGP introduces a mapping from each level of a qualitative factor to a vector zj()Rdz_j(\ell) \in \mathbb{R}^d in a latent space, where dd is low (typically d=2d=2 is sufficient for practical and interpretability reasons) (Zhang et al., 2018, Oune et al., 2021). For a problem with pp quantitative variables xRpx \in \mathbb{R}^p and qq qualitative factors t=(t1,,tq)t = (t_1,\dots,t_q), where factor jj takes mjm_j levels, the combined input is mapped as:

(x,t)(x,z(t)),z(t)=(z1(t1),,zq(tq)).(x, t) \mapsto (x, z(t)), \qquad z(t) = (z_1(t_1),\dots, z_q(t_q)).

For models involving combinations of categorical variables or multi-source data, each category or source ID tt is one-hot encoded and multiplied by a learned matrix AA to yield z(t)=ϕ(t)Az(t) = \phi(t) A where ARm×dA \in \mathbb{R}^{m \times d} (m=m= number of categories) (Oune et al., 2021, Deng et al., 2022).

To prevent non-identifiability in the latent embedding, certain coordinates are fixed, e.g., setting zj(1)=(0,0)z_j(1) = (0,0), zj(2)=(c,0)z_j(2) = (c,0) for some constant cc (Zhang et al., 2018). This mapping is jointly optimized with the GP kernel hyperparameters.

2. LMGP Model Formulation and Kernel Structure

The LMGP model treats the response yy as a GP over the augmented space (x,z)(x, z):

y(x,t)GP(μ,K((x,t),(x,t))),y(x, t) \sim \text{GP}(\mu, K((x, t), (x', t'))),

where the kernel incorporates both quantitative and latent-encoded qualitative input differences:

K((x,t),(x,t))=σ2exp(k=1pϕk(xkxk)2j=1qzj(tj)zj(tj)22).K((x, t), (x', t')) = \sigma^2 \exp\Big(-\sum_{k=1}^p \phi_k (x_k - x'_k)^2 - \sum_{j=1}^q \|z_j(t_j) - z_j(t'_j)\|_2^2 \Big).

Variants use anisotropic (ARD) squared-exponential or Matérn forms with separate length-scale parameters for quantitative and latent dimensions (Oune et al., 2021, Deng et al., 2022). For multi-fidelity or multi-source tasks, the kernel allows extensions such as:

K((x,t,θ),(x,t,θ))=σ2exp(z(t)z(t)2(xx)Λx(xx)(θθ)Λθ(θθ))K((x, t, \theta), (x', t', \theta')) = \sigma^2 \exp\Big(-\|z(t) - z(t')\|^2 - (x-x')^\top \Lambda_x (x-x') - (\theta-\theta')^\top \Lambda_\theta (\theta-\theta') \Big)

where θ\theta are shared calibration parameters (Oune et al., 2021).

A constant mean or a general basis f(x)βf(x)^\top \beta is used, with β\beta estimated in closed form.

3. Parameter Estimation and Training

Parameter estimation proceeds by maximizing the GP marginal likelihood with respect to both native GP hyperparameters (correlation lengths, process variance, nugget/noise) and all latent coordinates or mapping matrices AA (and any calibration parameters if present) (Zhang et al., 2018, Oune et al., 2021, Oune et al., 2021, Deng et al., 2022). The negative log-marginal likelihood for nn observations {(x(i),t(i),yi)}i=1n\{(x^{(i)},t^{(i)},y_i)\}_{i=1}^n is:

L=12(yFβ)(K)1(yFβ)+12logK+n2log2π,\mathcal{L} = \frac{1}{2}(y - F\beta)^\top (K)^{-1}(y - F\beta) + \frac{1}{2} \log |K| + \frac{n}{2}\log 2\pi,

where KK is the augmented kernel matrix, and FF is the design matrix for the mean basis.

The optimization is carried out using gradient-based algorithms (e.g., L-BFGS or interior-point solvers), with gradients of L\mathcal{L} computed analytically with respect to all entries of AA, latent priors, or kernel length scales (Oune et al., 2021). Multiple random restarts and constraints on the latent coordinates or mapping weights are employed to avoid local minima and numerical instability.

For the calibration setting, unknown parameters θ\theta are treated as hyperparameters and included in the optimization. The fully Bayesian alternative, involving priors and, e.g., MCMC, is conceptually possible but not implemented in the referenced works (Oune et al., 2021).

4. Theoretical Foundations and Interpretation

The justification for the latent embedding of qualitative/categorical factors is grounded in sufficient dimension reduction: any effect of a qualitative setting tt on a physical (or engineered) system must arise from some latent numerical mechanism or variable v(t)v(t), which, after mapping (possibly nonlinear), determines the system output. If the response depends on a low-dimensional summary h(v)Rdh(v) \in \mathbb{R}^d, then learning a low-dimensional z(t)=h(v(t))z(t) = h(v(t)) captures all relevant distinctions among categories (Zhang et al., 2018).

The learned latent representation provides interpretability:

  • Categorical levels with similar effects cluster in latent space.
  • For multiple factors, the latent embedding often forms grid or simplex structures, indicating independence or symmetric relations.
  • In multi-fidelity setups, data sources with strong mutual correlation are placed close together in latent space, diagnosing trusted surrogates or untrustworthy sources (Oune et al., 2021, Deng et al., 2022).
  • In physical problems (e.g., beam bending), latent coordinates recover well-known mechanistic summaries (e.g., moment of inertia) (Zhang et al., 2018).

LMGP is mathematically equivalent to introducing a neural-network layer from the category encoding to the latent coordinates, followed by a GP covariance on the augmented input, yielding an explainable neural-network representation (Oune et al., 2021).

5. Practical Implementation and Algorithmic Steps

The typical LMGP workflow consists of:

  • Assigning each qualitative/categorical combination a latent vector (learned) or one-hot code for linear mapping.
  • Initializing mapping parameters and kernel hyperparameters.
  • Iteratively:
    • Computing the augmented inputs (x,z)(x, z) (or composite uu).
    • Building the kernel matrix KK.
    • Evaluating the marginal likelihood and gradients.
    • Updating parameters.
  • Training halts at convergence, after which predictions at new (x,t)(x^*, t^*) use the learned mapping for tt^* and apply the standard GP predictor formulas.

Pseudocode variants for training and prediction are detailed in (Oune et al., 2021, Deng et al., 2022), highlighting modularity and scalability to multi-source, multi-fidelity, or mixed-variable settings.

Computational complexity matches standard GP regression (O(n3)\mathcal{O}(n^3) per likelihood evaluation for nn data points), with typically modest hyperparameter counts (order j2mj3+p\sum_j 2m_j-3 + p). LMGP can be combined with sparse GP or inducing-point methods for large-scale problems (Oune et al., 2021, Deng et al., 2022).

6. Empirical Performance and Applications

LMGP achieves state-of-the-art predictive accuracy and versatility across a range of tasks involving mixed and categorical inputs, multi-fidelity modeling, and surrogate calibration (Zhang et al., 2018, Oune et al., 2021, Oune et al., 2021, Deng et al., 2022). Notable results include:

  • Superior hold-out RMSE compared to alternative GP methods for categorical/mixed data (unrestrictive covariance, multiplicative covariance, additive-UC), often by an order of magnitude, at modest n50n \approx 50–$100$ (Zhang et al., 2018).
  • Effective fusion of high- and low-fidelity simulators, yielding 2–10 times lower MSE than GPs on high-fidelity data alone, and 2–5 times lower than Kennedy-O'Hagan co-kriging, with robust uncertainty quantification (Oune et al., 2021).
  • In calibration tasks, LMGP estimates of shared parameters are unbiased and have lower variance than classic modular calibration, particularly under nontrivial noise or model misspecification (Oune et al., 2021).
  • In complex multiscale engineering contexts, LMGP enables acceleration of expensive direct numerical simulations by up to 240× via ROM surrogates, and its learned latent embedding separates response types and fidelity levels for enhanced interpretability (Deng et al., 2022).
  • In combinatorial design and Bayesian optimization, LMGP outperforms standard GPs that rely on manual featurization or ignore qualitative structure, enabling more efficient search over heterogeneous design spaces (Oune et al., 2021).

7. Limitations and Extensions

The principal limitation of LMGP is the cubic scaling in the number of data points without approximation, as with all classical GPs. The number of hyperparameters increases linearly with the number of categorical levels and quantitative variables. The choice of latent dimension dd is critical: d=1d=1 can be too restrictive, while d=2d=2 is sufficient for most applications, balancing representational power, numerical stability, and parsimony. In principle, one could increase dd up to mj1m_j-1 per categorical factor, but this incurs O(mj2)O(m_j^2) parameters, defeating the purpose of a parsimonious embedding (Zhang et al., 2018).

LMGP does not, in its standard form, enable exact fully Bayesian inference over the mapping or kernel hyperparameters, though this extension is conceptually possible.

A plausible implication is that LMGP can be generalized to deep architectures by replacing the linear latent mapping with multilayer (possibly nonlinear) neural networks, integrating with recent developments in deep kernel learning (Oune et al., 2021). The structure of the learned latent space provides a direct visualization and diagnosis tool for source/model trust and for understanding factor interactions across application domains (Oune et al., 2021, Deng et al., 2022).

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Latent-Map Gaussian Process (LMGP).