Getis-Ord Gi* Statistic
- Getis-Ord Gi* is a local statistic that measures the intensity of high or low value clusters, standardized using z-scores for statistical inference.
- It utilizes various spatial weights—such as binary contiguity, row-standardization, distance bands, and continuous kernels—to tailor analyses to different scales and applications.
- Recent methodological advances have improved its computational efficiency and applicability, extending its use to fields like epidemiology, urban planning, and machine learning.
The Getis-Ord Gi* statistic is a local indicator of spatial association (LISA) specifically designed to quantify the presence and intensity of high-value (“hotspot”) or low-value (“coldspot”) spatial clusters within a spatial dataset. Originally introduced in the spatial statistics and spatial econometrics literature, Gi* is widely used across fields such as geostatistics, epidemiology, environmental science, crime analysis, and remote sensing to detect spatial heterogeneity, evaluate spatial association, and map spatially significant anomaly regions at granular scales. Rigorous developments and recent methodological advances have strengthened its interpretation, computational efficiency, and theoretical connection to global autocorrelation indices such as Moran’s I.
1. Mathematical Basis and Definition
The Getis-Ord Gi* statistic evaluates, for each spatial unit , whether the attribute values in its local neighborhood are significantly higher or lower than would be expected under global spatial randomness. The classical form of Gi*, as standardized to a -score, is:
where:
- is the observed value at location (e.g., count/rate/intensity).
- is the spatial weight encoding the strength of connection between and .
- is the total number of spatial units.
- is the global mean.
- is the global (sample) standard deviation.
This standardized statistic allows direct use of the standard normal distribution for inference under the null hypothesis of spatial randomness. High positive indicates a hotspot, while large negative marks a coldspot (Kubuafor et al., 17 Aug 2025, Grigorev et al., 3 Jun 2025, Kashlak et al., 2020).
Both the “self” and “neighbor” contributions are included, distinguishing Gi* from its exclusionary counterpart , though both can be rewritten in quadratic or matrix notation for large-scale or eigendecomposition-based analyses (Chen, 27 Aug 2025, Chen, 2018).
2. Spatial Weights and Neighborhood Construction
The choice and construction of the spatial weights matrix are central to Gi*’s local character. Different study domains employ different schemes:
- Binary contiguity: if and are contiguous (e.g., Queen’s case—sharing a side or vertex on a political map/grid); zero otherwise (Kubuafor et al., 17 Aug 2025, Grigorev et al., 3 Jun 2025).
- Row-standardization: Each row is normalized such that , equalizing influence from varying neighbor counts (Kubuafor et al., 17 Aug 2025).
- Distance bands: if the distance , for some fixed threshold; otherwise (Alsaleh et al., 18 Jan 2026).
- Continuous kernels: decay inversely or exponentially with , relevant for gravity-model or interaction potential reformulations (Chen, 27 Aug 2025, Chen, 2018).
All these choices affect sensitivity: larger or more numerous neighbors (wider spatial reach) dilute local extreme values; tight, sparse weighting accentuates highly localized clusters. Specialized domains (e.g., fine-grained urban grids, state-level epidemiological analyses) require weight construction consistent with the scale and the underlying spatial process (Grigorev et al., 3 Jun 2025, Kubuafor et al., 17 Aug 2025).
3. Computation, Standardization, and Inference
Computation Workflow
The procedure consists of:
- Selection or calculation of local variable(s) (counts, rates, attributes), often standardized or normalized in advance (or, for rates, directly used).
- Construction of spatial weights via the chosen adjacency, contiguity, or distance-based scheme.
- For each unit : a. Compute the weighted sum in the numerator. b. Calculate the mean and standard deviation over all . c. Standardize using the denominator to obtain .
- Assess significance by comparing against critical -values for the desired confidence level.
Pseudocode implementations and full variable flow have been detailed for spatial point pattern analysis (Alsaleh et al., 18 Jan 2026), raster/grid analysis (Grigorev et al., 3 Jun 2025), and even integration within convolutional neural network layers (Deng et al., 2019).
Inference and Multiple Testing
The classic hypothesis testing approach is analytic, interpreting as a -score under approximate normality. Hotspots correspond to above a chosen percentile, often for a two-sided test (Kubuafor et al., 17 Aug 2025, Alsaleh et al., 18 Jan 2026). Alternative inferential strategies include:
- Permutation-based inference: Randomly reassign among locations to generate the null distribution of (Kashlak et al., 2020).
- Computation-free nonparametric testing: Closed-form analytic bounds for -values using Khintchine-type inequalities, requiring only complexity and suitable even for non-Gaussian, small- scenarios (Kashlak et al., 2020).
Formal multiple-testing correction is rarely applied in published work; thus, some detected clusters may be spurious in analyses reporting results for many regions or features (Kubuafor et al., 17 Aug 2025, Grigorev et al., 3 Jun 2025).
4. Theoretical Framework and Relationship to Global Indices
Recent mathematical work situates Gi* within a unified spatial statistics framework:
- Quadratic form and potential theory: The global Getis-Ord index is a quadratic form , and local Gi* is , where is the normalized attribute vector (Chen, 27 Aug 2025, Chen, 2018).
- Gravity-model equivalence: With suitably defined kernels , local Gi* is directly proportional to classical gravity-model potential at . Hence, Gi* can be interpreted as measuring local “spatial interaction intensity” (Chen, 27 Aug 2025, Chen, 2018).
- Decomposition of Moran’s I: Moran’s I can be decomposed into the global Getis-Ord index, the sum of local Gi*, a size-correlation function, and the number of elements, reflecting both global structure and local clustering (Chen, 27 Aug 2025).
- Scatterplot diagnostics: Gi* values versus unitized variables can be visualized in scatterplots analogous to Moran’s I plots, partitioning observations into high-high, high-low, low-high, and low-low quadrants (Chen, 2018).
This framework clarifies that local clustering (hotspots/coldspots) is not merely a secondary effect, but a constituent part of global spatial autocorrelation structure.
5. Applications Across Spatial Domains
Gi* is applicable in any setting with indexed spatial data and an interest in mapping local association. Notable domains and implementations include:
| Domain | Attribute/Unit | Spatial Weight |
|---|---|---|
| Chronic disease mapping | State-level mortality rates | Queen’s contiguity (row-standardized) (Kubuafor et al., 17 Aug 2025) |
| Urban accident analysis | Grid cell crash counts | Queen contiguity (binary) (Grigorev et al., 3 Jun 2025) |
| Traffic collision severity | Pointwise collision events (weights = severity) | Fixed distance band (binary) (Alsaleh et al., 18 Jan 2026) |
| Electoral spatial analysis | Vote share per precinct | Adjacency graph (binary) (Kashlak et al., 2020) |
| Urban systems science | City populations | Distance-decay kernels (Chen, 27 Aug 2025, Chen, 2018) |
| Deep learning for remote sensing | CNN activations (feature maps) | Local windows with distance-based weights (Deng et al., 2019) |
The Gi* statistic’s flexibility allows adjustment for varying data resolutions, attribute types (counts, rates, intensities), and neighbor definitions, facilitating both classical (e.g., epidemiological, urban planning) and novel (e.g., spatial pooling in neural nets) applications.
6. Interpretation, Visualization, and Caveats
Interpretation is standardized: significant, large positive Gi* indicates a spatial “hotspot” (local cluster of high values), while large negative Gi* marks a “coldspot” (local low-value cluster). Visual mappings typically employ warm/cool color scales to represent hotspots/coldspots, neutral for non-significant areas (Kubuafor et al., 17 Aug 2025, Grigorev et al., 3 Jun 2025).
Caveats include:
- Scale sensitivity and MAUP: The result depends on the spatial resolution and weight construction; changing grid size or adjacency alters hotspot detection (Grigorev et al., 3 Jun 2025).
- Aggregation bias: Coarse units may mask within-unit heterogeneity; observed hotspots may be artefacts of zonation.
- Null distribution validity: Analytic normality is approximate, especially for skewed or heavy-tailed or small ; permutation or computation-free bounds may be preferable (Kashlak et al., 2020).
- Multiple testing: Numerous parallel Gi* tests inflate type-I error rate; the absence of systematic correction is a limitation (Kubuafor et al., 17 Aug 2025).
- Purely descriptive nature: Gi* detects spatial association but does not infer causality or adjust for confounding/covariate effects.
Mapping conventions, critical value selection, and cluster frequency tallies provide operational guidance for result reporting and policy-oriented spatial targeting (Kubuafor et al., 17 Aug 2025, Alsaleh et al., 18 Jan 2026).
7. Extensions and Computational Developments
Extensions to the Gi* statistic have appeared in domains with complex spatial dependencies or high computational demands:
- Computation-free nonparametric inference: Analytic bounds substituting for Monte-Carlo permutation preserve statistical validity at vastly lower computational burden, enabling multi-scale or network-scale testing (Kashlak et al., 2020).
- Integration in machine learning: The Gi* statistic as a pooling mechanism in CNNs improves generalization for geospatial segmentation by enforcing spatially meaningful activation selection (Deng et al., 2019).
- Structural decomposition: Gi* serves as a building block for decomposing and interpreting global spatial autocorrelation indices (e.g., explicit mathematical linkages with Moran’s I), unifying local and global perspectives (Chen, 27 Aug 2025, Chen, 2018).
The methodological versatility and deep theoretical foundation of Getis-Ord Gi* ensure its continuing prominence in spatial data science, spatial epidemiology, urban analytics, and spatially-informed machine learning.