Latent-Map Gaussian Process (LMGP)
- LMGP is a Gaussian Process extension that embeds qualitative variables into a low-dimensional latent space, unifying quantitative and categorical inputs.
- It utilizes a joint kernel over augmented inputs and optimizes both latent embeddings and GP hyperparameters to achieve superior predictive accuracy.
- Empirical studies show LMGP outperforms standard methods in multi-fidelity, calibration, and mixed-data modeling tasks.
Latent-Map Gaussian Process (LMGP) is a generalization of Gaussian process (GP) regression designed to seamlessly incorporate both quantitative and qualitative (categorical) inputs by embedding the latter into a learned low-dimensional latent space. LMGP provides a unified, likelihood-based framework that allows standard GP methodologies to be extended naturally to problems involving mixed data types, data fusion, multi-fidelity modeling, and calibration. Central to LMGP is the principle that each categorical level or combination is mapped to a continuous vector in latent space, and the kernel function acts jointly on the augmented numerical-plus-latent input. This approach unifies the treatment of quantitative and qualitative inputs, automatically learns category relationships, and offers interpretability and efficiency advantages over conventional multiresponse GP or specialized covariance approaches.
1. Latent Map Representation of Qualitative Inputs
LMGP introduces a mapping from each level of a qualitative factor to a vector in a latent space, where is low (typically is sufficient for practical and interpretability reasons) (Zhang et al., 2018, Oune et al., 2021). For a problem with quantitative variables and qualitative factors , where factor takes levels, the combined input is mapped as:
For models involving combinations of categorical variables or multi-source data, each category or source ID is one-hot encoded and multiplied by a learned matrix to yield where ( number of categories) (Oune et al., 2021, Deng et al., 2022).
To prevent non-identifiability in the latent embedding, certain coordinates are fixed, e.g., setting , for some constant (Zhang et al., 2018). This mapping is jointly optimized with the GP kernel hyperparameters.
2. LMGP Model Formulation and Kernel Structure
The LMGP model treats the response as a GP over the augmented space :
where the kernel incorporates both quantitative and latent-encoded qualitative input differences:
Variants use anisotropic (ARD) squared-exponential or Matérn forms with separate length-scale parameters for quantitative and latent dimensions (Oune et al., 2021, Deng et al., 2022). For multi-fidelity or multi-source tasks, the kernel allows extensions such as:
where are shared calibration parameters (Oune et al., 2021).
A constant mean or a general basis is used, with estimated in closed form.
3. Parameter Estimation and Training
Parameter estimation proceeds by maximizing the GP marginal likelihood with respect to both native GP hyperparameters (correlation lengths, process variance, nugget/noise) and all latent coordinates or mapping matrices (and any calibration parameters if present) (Zhang et al., 2018, Oune et al., 2021, Oune et al., 2021, Deng et al., 2022). The negative log-marginal likelihood for observations is:
where is the augmented kernel matrix, and is the design matrix for the mean basis.
The optimization is carried out using gradient-based algorithms (e.g., L-BFGS or interior-point solvers), with gradients of computed analytically with respect to all entries of , latent priors, or kernel length scales (Oune et al., 2021). Multiple random restarts and constraints on the latent coordinates or mapping weights are employed to avoid local minima and numerical instability.
For the calibration setting, unknown parameters are treated as hyperparameters and included in the optimization. The fully Bayesian alternative, involving priors and, e.g., MCMC, is conceptually possible but not implemented in the referenced works (Oune et al., 2021).
4. Theoretical Foundations and Interpretation
The justification for the latent embedding of qualitative/categorical factors is grounded in sufficient dimension reduction: any effect of a qualitative setting on a physical (or engineered) system must arise from some latent numerical mechanism or variable , which, after mapping (possibly nonlinear), determines the system output. If the response depends on a low-dimensional summary , then learning a low-dimensional captures all relevant distinctions among categories (Zhang et al., 2018).
The learned latent representation provides interpretability:
- Categorical levels with similar effects cluster in latent space.
- For multiple factors, the latent embedding often forms grid or simplex structures, indicating independence or symmetric relations.
- In multi-fidelity setups, data sources with strong mutual correlation are placed close together in latent space, diagnosing trusted surrogates or untrustworthy sources (Oune et al., 2021, Deng et al., 2022).
- In physical problems (e.g., beam bending), latent coordinates recover well-known mechanistic summaries (e.g., moment of inertia) (Zhang et al., 2018).
LMGP is mathematically equivalent to introducing a neural-network layer from the category encoding to the latent coordinates, followed by a GP covariance on the augmented input, yielding an explainable neural-network representation (Oune et al., 2021).
5. Practical Implementation and Algorithmic Steps
The typical LMGP workflow consists of:
- Assigning each qualitative/categorical combination a latent vector (learned) or one-hot code for linear mapping.
- Initializing mapping parameters and kernel hyperparameters.
- Iteratively:
- Computing the augmented inputs (or composite ).
- Building the kernel matrix .
- Evaluating the marginal likelihood and gradients.
- Updating parameters.
- Training halts at convergence, after which predictions at new use the learned mapping for and apply the standard GP predictor formulas.
Pseudocode variants for training and prediction are detailed in (Oune et al., 2021, Deng et al., 2022), highlighting modularity and scalability to multi-source, multi-fidelity, or mixed-variable settings.
Computational complexity matches standard GP regression ( per likelihood evaluation for data points), with typically modest hyperparameter counts (order ). LMGP can be combined with sparse GP or inducing-point methods for large-scale problems (Oune et al., 2021, Deng et al., 2022).
6. Empirical Performance and Applications
LMGP achieves state-of-the-art predictive accuracy and versatility across a range of tasks involving mixed and categorical inputs, multi-fidelity modeling, and surrogate calibration (Zhang et al., 2018, Oune et al., 2021, Oune et al., 2021, Deng et al., 2022). Notable results include:
- Superior hold-out RMSE compared to alternative GP methods for categorical/mixed data (unrestrictive covariance, multiplicative covariance, additive-UC), often by an order of magnitude, at modest –$100$ (Zhang et al., 2018).
- Effective fusion of high- and low-fidelity simulators, yielding 2–10 times lower MSE than GPs on high-fidelity data alone, and 2–5 times lower than Kennedy-O'Hagan co-kriging, with robust uncertainty quantification (Oune et al., 2021).
- In calibration tasks, LMGP estimates of shared parameters are unbiased and have lower variance than classic modular calibration, particularly under nontrivial noise or model misspecification (Oune et al., 2021).
- In complex multiscale engineering contexts, LMGP enables acceleration of expensive direct numerical simulations by up to 240× via ROM surrogates, and its learned latent embedding separates response types and fidelity levels for enhanced interpretability (Deng et al., 2022).
- In combinatorial design and Bayesian optimization, LMGP outperforms standard GPs that rely on manual featurization or ignore qualitative structure, enabling more efficient search over heterogeneous design spaces (Oune et al., 2021).
7. Limitations and Extensions
The principal limitation of LMGP is the cubic scaling in the number of data points without approximation, as with all classical GPs. The number of hyperparameters increases linearly with the number of categorical levels and quantitative variables. The choice of latent dimension is critical: can be too restrictive, while is sufficient for most applications, balancing representational power, numerical stability, and parsimony. In principle, one could increase up to per categorical factor, but this incurs parameters, defeating the purpose of a parsimonious embedding (Zhang et al., 2018).
LMGP does not, in its standard form, enable exact fully Bayesian inference over the mapping or kernel hyperparameters, though this extension is conceptually possible.
A plausible implication is that LMGP can be generalized to deep architectures by replacing the linear latent mapping with multilayer (possibly nonlinear) neural networks, integrating with recent developments in deep kernel learning (Oune et al., 2021). The structure of the learned latent space provides a direct visualization and diagnosis tool for source/model trust and for understanding factor interactions across application domains (Oune et al., 2021, Deng et al., 2022).