Robust Local Gaussian Process (RLGP)

Updated 16 December 2025

RLGP is a scalable, localized Gaussian process regression technique that uses adaptive neighborhood selection and robust outlier correction for nonstationary data.
It employs mean-shift correction, convex perspective loss, and ℓ₀-based trimming to mitigate the impact of outliers in complex datasets.
Using block-coordinate descent, RLGP achieves high predictive accuracy and computational efficiency in high-dimensional, heterogeneous environments.

Robust Local Gaussian Process (RLGP) methods provide a scalable, adaptive, and outlier-resistant approach to Gaussian process regression, particularly effective for modeling multidimensional response surfaces that are nonstationary, piecewise-smooth, or exhibit abrupt discontinuities. RLGP integrates adaptive neighborhood selection, robustification mechanisms such as mean-shift correction and perspective transforms, and ℓ₀-based outlier trimming to ensure both predictive accuracy and computational efficiency, particularly in high-dimensional or heterogeneous data settings (Adjetey et al., 14 Dec 2025).

1. Foundation: Local Gaussian Process and Adaptive Neighborhoods

RLGP operates by eschewing a global Gaussian process (GP) fit on the full dataset in favor of localized models. For each test input $x_*$ , the method selects its $n$ Euclidean nearest neighbors from the training set $D = \{(x_i, y_i)\}_{i=1}^N \subset \mathbb{R}^d \times \mathbb{R}$ :

$D_n(x_*) = \{(x_{(i)}, y_{(i)}) : i = 1 \ldots n\}$

A GP prior with mean $m(x) = \mu$ and squared-exponential kernel $k(x_i, x_j; \theta_0, \vartheta) = \theta_0 \exp\{-\vartheta \|x_i - x_j\|^2\}$ is posited over the local neighborhood. The observed data vector $y$ is modeled via:

$y \sim \mathcal{N}(1\mu, \Sigma),\quad \Sigma = \nu I + C$

where $C_{ij} = k(x_i, x_j; \theta_0, \vartheta)$ and $\nu$ is the nugget. Traditional local GPs minimize the negative log-marginal likelihood; however, such approaches are vulnerable to bias from neighborhood outliers when the local region straddles a sharp feature or jump.

2. Robustification: Mean-Shift Correction and Perspective Loss

RLGP addresses this vulnerability through a combination of observation-specific mean-shift parameters and a convex perspective transform of the loss. Building upon Huber's robust estimation concepts, RLGP replaces the typical log-determinant penalized quadratic form with a multivariate Effros–Hansen perspective:

$L_{\text{persp}}(\mu, \nu, \theta_0, \vartheta) = \frac{1}{2}(y-1\mu)^\top S^{-1}(y-1\mu) + \frac{1}{2}\mathrm{Tr}(S)$

where $S^2 = \Sigma$ . To further neutralize gross outliers, an observation-specific shift vector $\gamma \in \mathbb{R}^n$ is introduced, modifying the residuals to $r = y - 1\mu - \gamma$ . The robustified RLGP objective becomes:

$\min_{\mu, \gamma, \nu, \theta_0, \vartheta} L(\mu, \gamma, \nu, \theta_0, \vartheta) = \frac{1}{2}(y-1\mu - \gamma)^\top S^{-1}(y-1\mu - \gamma) + \frac{c_0}{2}\mathrm{Tr}(S)\ \text{subject to } \|\gamma\|_0 \leq q,\; S^2 = \Sigma = \nu I + C(\theta_0, \vartheta),\; \nu > 0$

The hard $\ell_0$ constraint, $q \leq n/2$ , limits the number of shifted (trimmed) outliers (Adjetey et al., 14 Dec 2025).

3. ℓ₀-Sparsity and Robust Neighborhood Trimming

Robust trimming in RLGP is implemented via an explicit $\ell_0$ “counting” constraint on the mean-shift vector $\gamma$ , ensuring only the $q$ most severe outliers within the neighborhood are shifted, all others being anchored at zero. This sparsity mechanism leads to a robust local model by effectively discounting (but not discarding) outlier responses, improving both trend and covariance estimates near boundaries and discontinuities. The method is distinct from penalized alternatives (which minimize $L + \lambda\|\gamma\|_0$ ) in that it guarantees a hard cap on the number of local outliers, maintaining model identifiability even for $q \ll n$ (Adjetey et al., 14 Dec 2025).

4. Block-Coordinate Descent and Computational Workflow

Model fitting at a single test point $x_*$ employs a block-coordinate descent algorithm, iteratively updating $\gamma$ , $\mu$ , and the hyperparameters $(\nu, \theta_0, \vartheta)$ :

Neighborhood Extraction: Select nearest $n$ neighbors.
Initialization: Set $\gamma^{(0)}=0$ , $\mu^{(0)}$ =median( $y$ ), others via robust statistics.
Iterative Updates:
- $\gamma$ -block: $\ell_0$ -constrained quadratic update via quantile-thresholding.
- $\mu$ -block: closed-form update given current $\gamma$ .
- Hyperparameter block: gradient or quasi-Newton optimization.
- Recompute $S$ from updated $\nu, \theta_0, \vartheta$ .
Prediction: Compute posterior mean and predictive variance for $x_*$ using the fitted local hyperparameters and mean-shift.

Each iteration is guaranteed to decrease the objective. The per-point computational cost is $O(n^3 \cdot \#\text{hyper-iter}) + O(n^2 \cdot \#\gamma\text{-inner-iter})$ , with $n$ typically in the range 30–200, making the cubic scaling tractable and independent of the global dataset size $N$ (Adjetey et al., 14 Dec 2025).

5. Computational Complexity and Scalability

Key complexity characteristics:

Neighbor search: $O(Nd)$ (brute-force) or $O(N\log N)$ (kd-tree) per test point for $d$ -dimensional input.
Model fitting: $O(n^3)$ per test point due to covariance inversion, but with $n \ll N$ .
Storage: $O(n^2)$ per test point (cache for local Gram matrices).
Linear scaling in feature dimension $d$ (only through neighbor distance computation), avoiding the $d^3$ “curse” typical in global GPs.
Massive parallelism: Each test prediction is independent, enabling embarrassingly parallel execution.

This structure ensures RLGP remains practical even for $d$ up to several hundred and $N$ in the millions, with runtime and RAM requirements that scale favorably in all relevant axes (Adjetey et al., 14 Dec 2025, Allison et al., 2023, Gogolashvili et al., 2022).

6. Empirical Performance and Application Contexts

RLGP demonstrates strong empirical performance on real-world and synthetic benchmarks:

Sharp discontinuities: RLGP delivers lowest Mean Squared Error (MSE) and best Continuous Ranked Probability Score (CRPS), surpassing laGP, liGP, TGP, DynaTree, jump-GP, DeepGP, and Bayesian neural networks, with typical MSE improvements of 10–40%.
Robustness to trimming parameter $q$ : Setting $q=0.15n$ yields robust fits; small mis-specification ( $\pm 5\%$ ) of $q$ has negligible impact (accuracy drop $<5\%$ ). Adaptive selection of $q$ via Tukey’s MAD further enhances reliability without manual tuning.
Scalability to high $d$ : In synthetic tests ( $d$ up to 500), RLGP maintains low MSE and calibration, with per-point CPU time $0.2$–$0.5$ seconds and memory never exceeding $0.5$ GB for $n \leq 500$ .
Broad applicability: Appropriate for response modeling with regime shifts, high-dimensional data when moderate local sample size is feasible, and in domains where credible uncertainty quantification (e.g., CRPS) is required (Adjetey et al., 14 Dec 2025).

Several extensions and parallel developments in RLGP have emerged, including:

Locally Smoothed GPR: Incorporating localization kernels to induce compactly supported, nonstationary posteriors that downweight distant (and likely outlying) training points, resulting in sparsity and robust predictions (Gogolashvili et al., 2022).
Nearest-Neighbour GP (GPnn): Leveraging exclusively local neighborhoods with minimal hyperparameter reliance, yielding massive computational gains and theoretical robustness to kernel mis-specification as $N \rightarrow \infty$ (Allison et al., 2023).
Modular/Variational Local GPs: Partitioning the parameter space with localized feature bases and ARD sparsity priors, naturally handling spatially varying smoothness and outlier omissions (Meier et al., 2014).
Application-specific RLGP: Demonstrated for frequency response estimation (Fang et al., 2024), control-based continuation (Renson et al., 2019), and molecular simulation surrogates (Shanks et al., 2023), confirming generality and performance across scientific domains.

Summary Table: RLGP Key Features vs. Classical and Other Local GP Variants

Aspect	RLGP (Adjetey et al., 14 Dec 2025, Allison et al., 2023)	laGP/liGP (Adjetey et al., 14 Dec 2025)	LSGPR (Gogolashvili et al., 2022)
Outlier robustness	Yes (ℓ₀ mean-shift, trimming)	Limited	Yes (localization)
Nonstationary adaptation	Yes (locality+robust objective)	Partial	Yes
Computational scaling	$O(n^3)$ per point, $n \ll N$	$O(n^3)$ per point	$O(S_0^3)$ per pt
Discontinuity modeling	Explicit	Smoothing artifacts	Possible
Hyperparameter tuning	Local, per neighborhood	Local	Local, kernel/BW
Uncertainty quantification	Yes (CRPS, local variances)	Yes	Yes

RLGP methods constitute an adaptable, scalable, and theoretically grounded solution for regression on heterogeneous, discontinuous, and high-dimensional data—particularly excelling where traditional GPs or simple local methods are compromised by nonstationarity, outliers, or scale constraints (Adjetey et al., 14 Dec 2025, Allison et al., 2023, Gogolashvili et al., 2022).