Papers
Topics
Authors
Recent
Search
2000 character limit reached

Robust Local Gaussian Process (RLGP)

Updated 16 December 2025
  • RLGP is a scalable, localized Gaussian process regression technique that uses adaptive neighborhood selection and robust outlier correction for nonstationary data.
  • It employs mean-shift correction, convex perspective loss, and ℓ₀-based trimming to mitigate the impact of outliers in complex datasets.
  • Using block-coordinate descent, RLGP achieves high predictive accuracy and computational efficiency in high-dimensional, heterogeneous environments.

Robust Local Gaussian Process (RLGP) methods provide a scalable, adaptive, and outlier-resistant approach to Gaussian process regression, particularly effective for modeling multidimensional response surfaces that are nonstationary, piecewise-smooth, or exhibit abrupt discontinuities. RLGP integrates adaptive neighborhood selection, robustification mechanisms such as mean-shift correction and perspective transforms, and ℓ₀-based outlier trimming to ensure both predictive accuracy and computational efficiency, particularly in high-dimensional or heterogeneous data settings (Adjetey et al., 14 Dec 2025).

1. Foundation: Local Gaussian Process and Adaptive Neighborhoods

RLGP operates by eschewing a global Gaussian process (GP) fit on the full dataset in favor of localized models. For each test input xx_*, the method selects its nn Euclidean nearest neighbors from the training set D={(xi,yi)}i=1NRd×RD = \{(x_i, y_i)\}_{i=1}^N \subset \mathbb{R}^d \times \mathbb{R}:

Dn(x)={(x(i),y(i)):i=1n}D_n(x_*) = \{(x_{(i)}, y_{(i)}) : i = 1 \ldots n\}

A GP prior with mean m(x)=μm(x) = \mu and squared-exponential kernel k(xi,xj;θ0,ϑ)=θ0exp{ϑxixj2}k(x_i, x_j; \theta_0, \vartheta) = \theta_0 \exp\{-\vartheta \|x_i - x_j\|^2\} is posited over the local neighborhood. The observed data vector yy is modeled via:

yN(1μ,Σ),Σ=νI+Cy \sim \mathcal{N}(1\mu, \Sigma),\quad \Sigma = \nu I + C

where Cij=k(xi,xj;θ0,ϑ)C_{ij} = k(x_i, x_j; \theta_0, \vartheta) and ν\nu is the nugget. Traditional local GPs minimize the negative log-marginal likelihood; however, such approaches are vulnerable to bias from neighborhood outliers when the local region straddles a sharp feature or jump.

2. Robustification: Mean-Shift Correction and Perspective Loss

RLGP addresses this vulnerability through a combination of observation-specific mean-shift parameters and a convex perspective transform of the loss. Building upon Huber's robust estimation concepts, RLGP replaces the typical log-determinant penalized quadratic form with a multivariate Effros–Hansen perspective:

Lpersp(μ,ν,θ0,ϑ)=12(y1μ)S1(y1μ)+12Tr(S)L_{\text{persp}}(\mu, \nu, \theta_0, \vartheta) = \frac{1}{2}(y-1\mu)^\top S^{-1}(y-1\mu) + \frac{1}{2}\mathrm{Tr}(S)

where S2=ΣS^2 = \Sigma. To further neutralize gross outliers, an observation-specific shift vector γRn\gamma \in \mathbb{R}^n is introduced, modifying the residuals to r=y1μγr = y - 1\mu - \gamma. The robustified RLGP objective becomes:

minμ,γ,ν,θ0,ϑL(μ,γ,ν,θ0,ϑ)=12(y1μγ)S1(y1μγ)+c02Tr(S) subject to γ0q,  S2=Σ=νI+C(θ0,ϑ),  ν>0\min_{\mu, \gamma, \nu, \theta_0, \vartheta} L(\mu, \gamma, \nu, \theta_0, \vartheta) = \frac{1}{2}(y-1\mu - \gamma)^\top S^{-1}(y-1\mu - \gamma) + \frac{c_0}{2}\mathrm{Tr}(S)\ \text{subject to } \|\gamma\|_0 \leq q,\; S^2 = \Sigma = \nu I + C(\theta_0, \vartheta),\; \nu > 0

The hard 0\ell_0 constraint, qn/2q \leq n/2, limits the number of shifted (trimmed) outliers (Adjetey et al., 14 Dec 2025).

3. ℓ₀-Sparsity and Robust Neighborhood Trimming

Robust trimming in RLGP is implemented via an explicit 0\ell_0 “counting” constraint on the mean-shift vector γ\gamma, ensuring only the qq most severe outliers within the neighborhood are shifted, all others being anchored at zero. This sparsity mechanism leads to a robust local model by effectively discounting (but not discarding) outlier responses, improving both trend and covariance estimates near boundaries and discontinuities. The method is distinct from penalized alternatives (which minimize L+λγ0L + \lambda\|\gamma\|_0) in that it guarantees a hard cap on the number of local outliers, maintaining model identifiability even for qnq \ll n (Adjetey et al., 14 Dec 2025).

4. Block-Coordinate Descent and Computational Workflow

Model fitting at a single test point xx_* employs a block-coordinate descent algorithm, iteratively updating γ\gamma, μ\mu, and the hyperparameters (ν,θ0,ϑ)(\nu, \theta_0, \vartheta):

  1. Neighborhood Extraction: Select nearest nn neighbors.
  2. Initialization: Set γ(0)=0\gamma^{(0)}=0, μ(0)\mu^{(0)}=median(yy), others via robust statistics.
  3. Iterative Updates:
    • γ\gamma-block: 0\ell_0-constrained quadratic update via quantile-thresholding.
    • μ\mu-block: closed-form update given current γ\gamma.
    • Hyperparameter block: gradient or quasi-Newton optimization.
    • Recompute SS from updated ν,θ0,ϑ\nu, \theta_0, \vartheta.
  4. Prediction: Compute posterior mean and predictive variance for xx_* using the fitted local hyperparameters and mean-shift.

Each iteration is guaranteed to decrease the objective. The per-point computational cost is O(n3#hyper-iter)+O(n2#γ-inner-iter)O(n^3 \cdot \#\text{hyper-iter}) + O(n^2 \cdot \#\gamma\text{-inner-iter}), with nn typically in the range 30–200, making the cubic scaling tractable and independent of the global dataset size NN (Adjetey et al., 14 Dec 2025).

5. Computational Complexity and Scalability

Key complexity characteristics:

  • Neighbor search: O(Nd)O(Nd) (brute-force) or O(NlogN)O(N\log N) (kd-tree) per test point for dd-dimensional input.
  • Model fitting: O(n3)O(n^3) per test point due to covariance inversion, but with nNn \ll N.
  • Storage: O(n2)O(n^2) per test point (cache for local Gram matrices).
  • Linear scaling in feature dimension dd (only through neighbor distance computation), avoiding the d3d^3 “curse” typical in global GPs.
  • Massive parallelism: Each test prediction is independent, enabling embarrassingly parallel execution.

This structure ensures RLGP remains practical even for dd up to several hundred and NN in the millions, with runtime and RAM requirements that scale favorably in all relevant axes (Adjetey et al., 14 Dec 2025, Allison et al., 2023, Gogolashvili et al., 2022).

6. Empirical Performance and Application Contexts

RLGP demonstrates strong empirical performance on real-world and synthetic benchmarks:

  • Sharp discontinuities: RLGP delivers lowest Mean Squared Error (MSE) and best Continuous Ranked Probability Score (CRPS), surpassing laGP, liGP, TGP, DynaTree, jump-GP, DeepGP, and Bayesian neural networks, with typical MSE improvements of 10–40%.
  • Robustness to trimming parameter qq: Setting q=0.15nq=0.15n yields robust fits; small mis-specification (±5%\pm 5\%) of qq has negligible impact (accuracy drop <5%<5\%). Adaptive selection of qq via Tukey’s MAD further enhances reliability without manual tuning.
  • Scalability to high dd: In synthetic tests (dd up to 500), RLGP maintains low MSE and calibration, with per-point CPU time $0.2$–$0.5$ seconds and memory never exceeding $0.5$ GB for n500n \leq 500.
  • Broad applicability: Appropriate for response modeling with regime shifts, high-dimensional data when moderate local sample size is feasible, and in domains where credible uncertainty quantification (e.g., CRPS) is required (Adjetey et al., 14 Dec 2025).

Several extensions and parallel developments in RLGP have emerged, including:

  • Locally Smoothed GPR: Incorporating localization kernels to induce compactly supported, nonstationary posteriors that downweight distant (and likely outlying) training points, resulting in sparsity and robust predictions (Gogolashvili et al., 2022).
  • Nearest-Neighbour GP (GPnn): Leveraging exclusively local neighborhoods with minimal hyperparameter reliance, yielding massive computational gains and theoretical robustness to kernel mis-specification as NN \rightarrow \infty (Allison et al., 2023).
  • Modular/Variational Local GPs: Partitioning the parameter space with localized feature bases and ARD sparsity priors, naturally handling spatially varying smoothness and outlier omissions (Meier et al., 2014).
  • Application-specific RLGP: Demonstrated for frequency response estimation (Fang et al., 2024), control-based continuation (Renson et al., 2019), and molecular simulation surrogates (Shanks et al., 2023), confirming generality and performance across scientific domains.

Summary Table: RLGP Key Features vs. Classical and Other Local GP Variants

Aspect RLGP (Adjetey et al., 14 Dec 2025, Allison et al., 2023) laGP/liGP (Adjetey et al., 14 Dec 2025) LSGPR (Gogolashvili et al., 2022)
Outlier robustness Yes (ℓ₀ mean-shift, trimming) Limited Yes (localization)
Nonstationary adaptation Yes (locality+robust objective) Partial Yes
Computational scaling O(n3)O(n^3) per point, nNn \ll N O(n3)O(n^3) per point O(S03)O(S_0^3) per pt
Discontinuity modeling Explicit Smoothing artifacts Possible
Hyperparameter tuning Local, per neighborhood Local Local, kernel/BW
Uncertainty quantification Yes (CRPS, local variances) Yes Yes

RLGP methods constitute an adaptable, scalable, and theoretically grounded solution for regression on heterogeneous, discontinuous, and high-dimensional data—particularly excelling where traditional GPs or simple local methods are compromised by nonstationarity, outliers, or scale constraints (Adjetey et al., 14 Dec 2025, Allison et al., 2023, Gogolashvili et al., 2022).

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Robust Local Gaussian Process (RLGP).