Papers
Topics
Authors
Recent
Search
2000 character limit reached

GeoShapley: Explainable AI for Spatial Modeling

Updated 21 January 2026
  • GeoShapley is a model-agnostic explainable AI approach that adapts Shapley values to spatial data by quantifying nonlinear effects and geographic heterogeneity.
  • It decomposes model predictions into intrinsic location effects, global feature contributions, and feature–location interactions to yield clear additive attributions.
  • GeoShapley leverages an adapted Kernel SHAP algorithm for efficient coalition sampling, offering interpretable outputs that support geospatial analysis and policy design.

GeoShapley is a model-agnostic explainable AI (XAI) approach for spatial machine learning models, fundamentally extending game-theoretic Shapley values to quantify both nonlinear feature effects and spatial heterogeneity in predictive modeling. GeoShapley conceptualizes geographic location as a single joint “player” in the Shapley value decomposition, enabling the separation of intrinsic spatial context, global feature contributions, and synergistic feature–location interactions. It is widely applied for post-hoc interpretation of black-box regressors—such as XGBoost, Random Forest, and TabNet—operating on georeferenced tabular data and spatial feature embeddings. GeoShapley’s outputs are directly interpretable as additive decompositions of model predictions, mapping intrinsic location effects and spatially varying coefficients, and providing tract-level attribution essential for geospatial analysis, regression, and policy design (Li, 2023Lu et al., 17 Dec 2025Li, 1 May 2025Li et al., 16 Apr 2025Liu, 2024).

1. Theoretical Foundation and Mathematical Formulation

GeoShapley generalizes the classic Shapley value framework from cooperative game theory to the spatial domain. For a model f(x)f(x) with pp features and gg geographic coordinates (typically g=2g=2 for latitude and longitude), GeoShapley decomposes the prediction for each instance ii as

yi=Φ0+ΦGEO,i+j=1pgΦj,i+j=1pgΦ(GEO,j),iy_i = \Phi_0 + \Phi_{\text{GEO},i} + \sum_{j=1}^{p-g} \Phi_{j,i} + \sum_{j=1}^{p-g} \Phi_{(\text{GEO}, j),i}

where:

  • Φ0\Phi_0 is the global baseline (expected prediction).
  • ΦGEO,i\Phi_{\text{GEO},i} is the intrinsic location effect, representing pure spatial context independent of other features.
  • Φj,i\Phi_{j,i} is the global, location-invariant effect of feature jj, capturing nonlinear global relationships.
  • Φ(GEO,j),i\Phi_{(\text{GEO}, j),i} is the synergistic interaction between location (GEO) and feature jj, quantifying spatial heterogeneity of feature effects (Li, 2023Lu et al., 17 Dec 2025).

Mathematical expressions for each component employ combinatorial averaging over all feature orderings, adapting the denominators to account for the joint treatment of spatial features. For example, the location effect is given by

ΦGEO(x,geo)=SM{GEO}S!(pSg)!(pg+1)![fS{GEO}(xS,geo)fS(xS)]\Phi_{\text{GEO}}(x, \text{geo}) = \sum_{S \subseteq M \setminus \{\text{GEO}\}} \frac{|S|!(p-|S|-g)!}{(p-g+1)!} \left[f_{S \cup \{\text{GEO}\}}(x_S, \text{geo}) - f_S(x_S)\right]

Interaction effects are computed using differences formed by including and excluding both location and a feature from the coalition:

A(GEO,j;S)=fS{GEO,j}fS{GEO}fS{j}+fSA(\text{GEO}, j; S) = f_{S \cup \{\text{GEO}, j\}} - f_{S \cup \{\text{GEO}\}} - f_{S \cup \{j\}} + f_S

and

Φ(GEO,j)(x,geo)=SM{GEO,j}S!(pSg1)!(pg+1)!A(GEO,j;S)\Phi_{(\text{GEO}, j)}(x, \text{geo}) = \sum_{S \subseteq M \setminus \{\text{GEO}, j\}} \frac{|S|!(p-|S|-g-1)!}{(p-g+1)!} A(\text{GEO}, j; S)

This additive decomposition respects coalitional game-theory efficiency and provides unbiased local estimates for all spatial and non-spatial effect components (Li, 2023Li, 1 May 2025).

2. Computation: Kernel-SHAP Algorithm and Practical Implementation

GeoShapley relies on an adapted Kernel SHAP estimator for efficient approximation of Shapley values in high-dimensional models. Key steps include:

  1. Model Fitting: Train a black-box ML model (e.g., XGBoost) on all features including spatial coordinates.
  2. Coalition Sampling: For each instance, sample KK random feature orderings, treating the GEO block as a joint player.
  3. Marginalization: For each coalition, marginalize absent features by imputing background reference values (bootstrapped or clustered).
  4. Computation: Evaluate model outputs with present features, accumulating marginal contributions for each coalition using the appropriate Shapley weights.
  5. Decomposition: Average across sampled coalitions to estimate Φ0\Phi_0, ΦGEO\Phi_{\text{GEO}}, Φj\Phi_j, and Φ(GEO,j)\Phi_{(\text{GEO}, j)} per instance.

The geoshapley Python package implements these steps, supporting both exact enumeration (small pgp-g) and Monte Carlo kernel sampling (large pgp-g), with background set sizes typically in the 50–300 range for computational efficiency. For spatial coefficient surface estimation, outputs can be further smoothed by geographically weighted regression (GWR) (Li, 2023Li et al., 16 Apr 2025Li, 1 May 2025Liu, 2024).

Algorithm Step Key Operation Computational Complexity
Coalition sampling Random orderings O(Kcost(f))O(K \cdot \text{cost}(f))
Marginalization Background imputation B×B \times model evaluations
Additive decomposition Linear weighting O(# features)O(\#~\text{features})

Practical recommendations include bootstrapping for uncertainty quantification, and careful validation of base model fidelity prior to GeoShapley analysis (Li, 1 May 2025Li, 2023).

3. Interpretative Structure: Spatially Varying Effects and Additive Attribution

GeoShapley directly parallels the interpretive structure of spatially varying coefficient models (SVCMs) and additive models. In GWR, regression is written as y=β0(u,v)+jβj(u,v)xjy = \beta_0(u,v) + \sum_j \beta_j(u,v)x_j; GeoShapley provides:

  • ΦGEO(x)\Phi_{\text{GEO}}(x) as the analog of β0(u,v)\beta_0(u,v) (spatial fixed effect).
  • Φj(x)\Phi_j(x) as the global main effect (analogous to gj(xj)g_j(x_j) in an additive model).
  • Φ(GEO,j)(x)\Phi_{(\text{GEO}, j)}(x) as a spatially varying coefficient surface for each feature, recoverable via univariate smoothing (Li, 2023Li et al., 16 Apr 2025).

Unlike standard SHAP, which isolates only global main effects (and conflates spatial and non-spatial contributions), GeoShapley explicitly separates intrinsic geographic context from feature–location interactions, yielding interpretable tract- or pixel-level maps essential for policy and scientific inference (Lu et al., 17 Dec 2025).

4. Comparative Evaluation: SHAP, MGWR, Moran Eigenvectors, and Ensemble Extensions

GeoShapley has been comparatively evaluated against:

  • Standard SHAP: Captures only global feature effects (Φj\Phi_j), failing to separate spatial heterogeneity or identify location–feature synergies.
  • MGWR: Produces parametric, smoothly varying coefficient surfaces using kernel bandwidth selection but is limited in nonlinear functional recovery and requires explicit kernel choice (Lu et al., 17 Dec 2025Li, 1 May 2025Liu, 2024).
  • Moran Eigenvector Spatial Filtering (ESF): ESF embeds spatial eigenvectors as features for controlling spatial autocorrelation but lacks direct marginal attribution of local spatial context; GeoShapley can treat high-dimensional spatial embeddings as the geo-player, with computational caveats when eigenvector blocks are large (Li et al., 16 Apr 2025).
  • XGeoML: Recent ensemble frameworks such as XGeoML generalize GeoShapley by aggregating multiple local models (GBR, RF, MLP, etc.) and multiple explainers (SHAP, LIME, FI) using spatial weighting and reliability-based ensemble aggregation. XGeoML enhances predictive and interpretive accuracy, particularly under strong geography–covariate nonlinearity (Liu, 2024).
Method Nonlinearity Spatial Heterogeneity Model Assumptions
SHAP Yes No None
MGWR Limited Yes Linear/Kernels
ESF No Indirect Linear
GeoShapley Yes Yes None
XGeoML Yes Yes (robust/ensemble) None

GeoShapley is unique in jointly explaining nonlinearity and spatial heterogeneity, while remaining model-agnostic (Lu et al., 17 Dec 2025Li, 2023).

5. Case Studies and Empirical Applications

GeoShapley has been utilized in various domains:

  • Synthetic Data Validation: GeoShapley recovers ground-truth spatial intercepts and varying coefficients for simulated spatial fields, accurately reflecting both nonlinear and spatial processes (validated by spatial correlations exceeding 0.9 for coordinate-based models) (Li et al., 16 Apr 2025Li, 2023).
  • Traffic Crash Density (Florida): GeoShapley reveals nonlinear effects such as threshold-driven crash risk in compact neighborhoods (score > 7), spatially amplifies urban crash contributions (Miami, Orlando, Tampa, Jacksonville), and identifies corridor-specific susceptibility (I-95, downtown cores). Interaction terms elucidate spatial variability in risk factors, facilitating targeted interventions (traffic calming, speed management, equity-based policy) (Lu et al., 17 Dec 2025).
  • Political Science (Voting Behaviors): County-level applications isolate both S-shaped nonlinearities in demographic predictors and spatially structured party preference patterns (e.g., South vs. Northeast) (Li, 1 May 2025).
  • Real Estate Modeling: In Seattle house price modeling, location exceeds all other features in explanatory value, with spatial interactions (e.g., house age premium in historic neighborhoods) directly mapped via GeoShapley outputs (Li, 2023).

Visualization strategies include mapping ΦGEO\Phi_{\text{GEO}} (intrinsic context), partial dependence plots of Φj\Phi_j (global nonlinearities), and spatial coefficient surfaces via Φ(GEO,j)\Phi_{(\text{GEO},j)} (Lu et al., 17 Dec 2025Li, 2023).

6. Limitations, Implementation Considerations, and Future Directions

GeoShapley has several limitations and considerations:

  • Computational Intensity: Kernel SHAP estimates require hundreds of background samples and numerous coalition evaluations per location; computational cost scales with feature and geo-player block size (Li, 2023Li et al., 16 Apr 2025).
  • Variance and Stability: Monte Carlo approximation and choice of background samples influence variance; large GEO blocks (e.g., Moran eigenvectors) amplify estimation noise unless coalition sampling is increased (Li et al., 16 Apr 2025).
  • Interpretation: Shapley values are marginal contributions, not regression slopes. Naively mapping Φ(GEO,j)\Phi_{(\text{GEO},j)} can mislead without smoothing and robustness checks (Li, 1 May 2025).
  • Model Fidelity: Explanatory insight is conditional on the accuracy and fit of the underlying ML model; robust preprocessing and diagnostics are essential (Li, 1 May 2025).
  • Kernel Bandwidth: For spatial smoothing, bandwidth selection critically affects smoothness and pattern recovery (Liu, 2024).

Proposed advances include causal Shapley extensions (direct/indirect effects), improved visualization/inference for high-dimensional spatial partial dependence, integration with bias detection and spatial fairness, and development of transferable spatial XAI models (Li, 1 May 2025Liu, 2024).

7. Software Ecosystem and Workflow Recommendations

Implementation of GeoShapley is supported by open-source tools including the geoshapley Python package (pip install geoshapley), leveraging scikit-learn, xgboost, flaml (AutoML), mgwr (MGWR reference), pysal, geopandas, and matplotlib. Recommended workflow:

  1. Collect and preprocess georeferenced tabular data.
  2. Fit a flexible ML regressor (ensemble/tree/NN), validate via cross-validation and hyperparameter optimization.
  3. Compute GeoShapley values using geoshapley, specifying model, background set, and coordinate block.
  4. Assess uncertainty via bootstrap resampling and reporting of confidence intervals.
  5. Visualize intrinsic location and spatially varying coefficients via map plots and partial dependence curves.
  6. Compare outputs to MGWR or ESF for benchmarking.
  7. Document all code, data, and parameter choices for reproducibility (Li, 2023Li, 1 May 2025).

Usage and performance tips include exact enumeration for small feature sets, Monte Carlo sampling for larger sets, background selection via clustering, and ensemble aggregation for robust spatial coefficient estimation in XGeoML (Li, 2023Liu, 2024).

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to GeoShapley.