Robust Local Gaussian Process (RLGP)
- RLGP is a scalable, localized Gaussian process regression technique that uses adaptive neighborhood selection and robust outlier correction for nonstationary data.
- It employs mean-shift correction, convex perspective loss, and ℓ₀-based trimming to mitigate the impact of outliers in complex datasets.
- Using block-coordinate descent, RLGP achieves high predictive accuracy and computational efficiency in high-dimensional, heterogeneous environments.
Robust Local Gaussian Process (RLGP) methods provide a scalable, adaptive, and outlier-resistant approach to Gaussian process regression, particularly effective for modeling multidimensional response surfaces that are nonstationary, piecewise-smooth, or exhibit abrupt discontinuities. RLGP integrates adaptive neighborhood selection, robustification mechanisms such as mean-shift correction and perspective transforms, and ℓ₀-based outlier trimming to ensure both predictive accuracy and computational efficiency, particularly in high-dimensional or heterogeneous data settings (Adjetey et al., 14 Dec 2025).
1. Foundation: Local Gaussian Process and Adaptive Neighborhoods
RLGP operates by eschewing a global Gaussian process (GP) fit on the full dataset in favor of localized models. For each test input , the method selects its Euclidean nearest neighbors from the training set :
A GP prior with mean and squared-exponential kernel is posited over the local neighborhood. The observed data vector is modeled via:
where and is the nugget. Traditional local GPs minimize the negative log-marginal likelihood; however, such approaches are vulnerable to bias from neighborhood outliers when the local region straddles a sharp feature or jump.
2. Robustification: Mean-Shift Correction and Perspective Loss
RLGP addresses this vulnerability through a combination of observation-specific mean-shift parameters and a convex perspective transform of the loss. Building upon Huber's robust estimation concepts, RLGP replaces the typical log-determinant penalized quadratic form with a multivariate Effros–Hansen perspective:
where . To further neutralize gross outliers, an observation-specific shift vector is introduced, modifying the residuals to . The robustified RLGP objective becomes:
The hard constraint, , limits the number of shifted (trimmed) outliers (Adjetey et al., 14 Dec 2025).
3. ℓ₀-Sparsity and Robust Neighborhood Trimming
Robust trimming in RLGP is implemented via an explicit “counting” constraint on the mean-shift vector , ensuring only the most severe outliers within the neighborhood are shifted, all others being anchored at zero. This sparsity mechanism leads to a robust local model by effectively discounting (but not discarding) outlier responses, improving both trend and covariance estimates near boundaries and discontinuities. The method is distinct from penalized alternatives (which minimize ) in that it guarantees a hard cap on the number of local outliers, maintaining model identifiability even for (Adjetey et al., 14 Dec 2025).
4. Block-Coordinate Descent and Computational Workflow
Model fitting at a single test point employs a block-coordinate descent algorithm, iteratively updating , , and the hyperparameters :
- Neighborhood Extraction: Select nearest neighbors.
- Initialization: Set , =median(), others via robust statistics.
- Iterative Updates:
- -block: -constrained quadratic update via quantile-thresholding.
- -block: closed-form update given current .
- Hyperparameter block: gradient or quasi-Newton optimization.
- Recompute from updated .
- Prediction: Compute posterior mean and predictive variance for using the fitted local hyperparameters and mean-shift.
Each iteration is guaranteed to decrease the objective. The per-point computational cost is , with typically in the range 30–200, making the cubic scaling tractable and independent of the global dataset size (Adjetey et al., 14 Dec 2025).
5. Computational Complexity and Scalability
Key complexity characteristics:
- Neighbor search: (brute-force) or (kd-tree) per test point for -dimensional input.
- Model fitting: per test point due to covariance inversion, but with .
- Storage: per test point (cache for local Gram matrices).
- Linear scaling in feature dimension (only through neighbor distance computation), avoiding the “curse” typical in global GPs.
- Massive parallelism: Each test prediction is independent, enabling embarrassingly parallel execution.
This structure ensures RLGP remains practical even for up to several hundred and in the millions, with runtime and RAM requirements that scale favorably in all relevant axes (Adjetey et al., 14 Dec 2025, Allison et al., 2023, Gogolashvili et al., 2022).
6. Empirical Performance and Application Contexts
RLGP demonstrates strong empirical performance on real-world and synthetic benchmarks:
- Sharp discontinuities: RLGP delivers lowest Mean Squared Error (MSE) and best Continuous Ranked Probability Score (CRPS), surpassing laGP, liGP, TGP, DynaTree, jump-GP, DeepGP, and Bayesian neural networks, with typical MSE improvements of 10–40%.
- Robustness to trimming parameter : Setting yields robust fits; small mis-specification () of has negligible impact (accuracy drop ). Adaptive selection of via Tukey’s MAD further enhances reliability without manual tuning.
- Scalability to high : In synthetic tests ( up to 500), RLGP maintains low MSE and calibration, with per-point CPU time $0.2$–$0.5$ seconds and memory never exceeding $0.5$ GB for .
- Broad applicability: Appropriate for response modeling with regime shifts, high-dimensional data when moderate local sample size is feasible, and in domains where credible uncertainty quantification (e.g., CRPS) is required (Adjetey et al., 14 Dec 2025).
7. Connections to Related Local and Robust GP Methods
Several extensions and parallel developments in RLGP have emerged, including:
- Locally Smoothed GPR: Incorporating localization kernels to induce compactly supported, nonstationary posteriors that downweight distant (and likely outlying) training points, resulting in sparsity and robust predictions (Gogolashvili et al., 2022).
- Nearest-Neighbour GP (GPnn): Leveraging exclusively local neighborhoods with minimal hyperparameter reliance, yielding massive computational gains and theoretical robustness to kernel mis-specification as (Allison et al., 2023).
- Modular/Variational Local GPs: Partitioning the parameter space with localized feature bases and ARD sparsity priors, naturally handling spatially varying smoothness and outlier omissions (Meier et al., 2014).
- Application-specific RLGP: Demonstrated for frequency response estimation (Fang et al., 2024), control-based continuation (Renson et al., 2019), and molecular simulation surrogates (Shanks et al., 2023), confirming generality and performance across scientific domains.
Summary Table: RLGP Key Features vs. Classical and Other Local GP Variants
| Aspect | RLGP (Adjetey et al., 14 Dec 2025, Allison et al., 2023) | laGP/liGP (Adjetey et al., 14 Dec 2025) | LSGPR (Gogolashvili et al., 2022) |
|---|---|---|---|
| Outlier robustness | Yes (ℓ₀ mean-shift, trimming) | Limited | Yes (localization) |
| Nonstationary adaptation | Yes (locality+robust objective) | Partial | Yes |
| Computational scaling | per point, | per point | per pt |
| Discontinuity modeling | Explicit | Smoothing artifacts | Possible |
| Hyperparameter tuning | Local, per neighborhood | Local | Local, kernel/BW |
| Uncertainty quantification | Yes (CRPS, local variances) | Yes | Yes |
RLGP methods constitute an adaptable, scalable, and theoretically grounded solution for regression on heterogeneous, discontinuous, and high-dimensional data—particularly excelling where traditional GPs or simple local methods are compromised by nonstationarity, outliers, or scale constraints (Adjetey et al., 14 Dec 2025, Allison et al., 2023, Gogolashvili et al., 2022).