Spatially-Varying Convolution Operators
- Spatially-varying convolution is a linear operator with kernels that change with spatial position, enabling local adaptation in filtering tasks.
- Methodologies include basis kernel expansions, dynamic filtering, and graph-based approaches, reducing computational cost while enhancing expression.
- Applications span imaging, spatial statistics, and geometric deep learning, providing improved deblurring, denoising, and model accuracy.
Spatially-varying convolution refers to a class of linear operators in which the convolution kernel is no longer fixed nor translation-invariant but varies as a function of spatial position. In contrast to classical stationary convolutions, spatially-varying convolutional models flexibly capture location-dependent filtering, spatial heterogeneity, or local adaptation, and are foundational in numerous fields including computer vision, computational imaging, geometric deep learning, and spatial statistics.
1. Mathematical Formulations and Operator Classes
Let denote an input signal or image, and the spatially-varying kernel that dictates the output at location based on input at . The canonical spatially-varying convolution is given by
where encodes the point-spread function (PSF) from position to (Chimitt et al., 2023). The dependence on and allows the kernel to change arbitrarily across space, in contrast to the shift-invariant case . In practice, both continuous and discrete variants are implemented, with discrete forms dominant in image processing and graph-based learning.
Common parameterizations include:
- Gathering (output-indexed) expansion: , where are spatially-varying weights modulating fixed bases (Chimitt et al., 2023).
- Scattering (input-indexed) expansion: , which is particularly natural for processes describing scattering or physical propagation.
- Adaptive basis expansion: Decompose the per-pixel kernel as a linear combination of global basis matrices with spatially-varying (possibly edge-specific in graphs) coefficients, e.g., in mesh-graph convolution (Zhou et al., 2020).
- Polynomial or mixture-based decomposition: is represented as a sum of basis functions with polynomially-varying coefficients over space (Hartung et al., 2012, Risser et al., 2015).
2. Methodological Variants and Implementations
Approaches to efficiently implementing spatially-varying convolution span a broad methodological spectrum:
2.1 Basis Kernel Methods
The kernel is expanded in a dictionary of fixed basis functions, with spatially-varying weights determined either per-input pixel, per-output pixel, or per edge in a graph setting (Chimitt et al., 2023, Zhou et al., 2020). This reduces the parameter count and computational cost compared to defining independent kernels for every spatial location.
2.2 Locally-Connected and Gated Architectures
Locally-connected layers assign a separate convolution kernel to each output spatial location, but at significant parameter and memory cost. Lightweight variants (e.g., CoordGate) modulate a small set of basis convolutions with position-dependent gating functions, implemented with compact MLPs applied to spatial coordinates, achieving the expressivity of spatially-varying convolution with negligible overhead (Howard et al., 2024).
2.3 Dynamic and Guided Filtering
Several neural architectures instantiate spatially-varying convolution implicitly—features or kernels are dynamically modulated according to local cues, blur/PSF maps, or learned gating coefficients (examples: GSVNet's dynamic fusion of warped segmentations (Lee et al., 2021), or SVBR-Net's dual U-Net with blur-map conditioning (Karaali et al., 2022)).
2.4 Graph-based and Mesh Operators
On graphs and meshes, spatially-varying convolution is formulated via either template-free, edge-specific basis decompositions (Zhou et al., 2020) or variant-density pooling, generalizing locality to complex topologies beyond regular grids.
2.5 Convolution-based Nonstationary Spatial Models
In spatial statistics, spatially-varying convolution underpins flexible nonstationary covariance construction for Gaussian processes, with kernels encoding spatially-varying anisotropy, directionality (e.g., wind-aligned covariance), or local scale (Fouedjio et al., 2014, Neto et al., 2012, Risser et al., 2015). The closed-form covariance structures facilitate efficient kriging and simulation on spatial data with nonhomogeneous dependencies.
3. Computational Strategies and Acceleration
Spatial adaptation greatly complicates classical convolution acceleration schemes:
- No FFT efficiency: As the kernel is not shift-invariant, FFT-based acceleration is generally inapplicable; direct per-pixel computations are necessary (Hartung et al., 2012).
- Basis projection and block-diagonalization: By restricting to a suitable low-dimensional subspace (Fourier, steerable, representation-theoretic bases), the computation can be reduced to a manageable number of standard convolutions followed by spatially-varying combination (Mitchel et al., 2020).
- Sparse decomposition and kernel interpolation: Approaches such as the differentiable sparse kernel complex represent as a cascade of sparse kernel applications with learnable, differentiable sampling and kernel-space interpolation for runtime efficiency (Wu et al., 4 Dec 2025).
- Massive parallelization: Spatially-varying convolution admits naturally parallel, per-pixel computation, making it suitable for GPU and multicore acceleration strategies (Hartung et al., 2012).
The table below summarizes representative approaches by parameterization and computational complexity:
| Method | Parameterization | Efficiency Features |
|---|---|---|
| Gathering/Scattering (Chimitt et al., 2023) | Basis expansion (output/input indexed) | convolutions + ops |
| Mesh vcConv (Zhou et al., 2020) | Edge-specific basis + per-edge coefficients | Shared basis matrix, edgewise coefficients; parameter efficient |
| CoordGate (Howard et al., 2024) | Position-based gating over basis conv features | MLP per-pixel, negligible overhead |
| Gauss/Mixture Nonstationary GP (Risser et al., 2015) | Basis kernels, mixture weights | Parallel local likelihood, global GP with |
| Sparse kernel complex (Wu et al., 4 Dec 2025) | Multi-layer sparse kernel cascade, interpolation | per pixel, differentiable, high fidelity |
4. Gathering vs. Scattering Forms and Physical Interpretation
A fundamental distinction arises between gathering and scattering forms of spatially-varying convolution (Chimitt et al., 2023, Mitchel et al., 2020):
- Gathering computes output values by gathering filtered images (using shift-invariant filters) and locally combining them with spatially-varying weights per output pixel; best for adaptive filtering, denoising, and general image-processing contexts.
- Scattering distributes input pixel values across the image using locally-chosen kernels for each source; naturally matches physical light-propagation, PSF modeling, and physical image formation.
- Equivalence: These forms are only equivalent in the trivial stationary case; for general they yield distinct results and are not interchangeable (proof by matrix commutation, see (Chimitt et al., 2023)).
Choosing the appropriate form is critical to avoid model mismatch (e.g., edge bleeding when using scattering for denoising, or incorrect light propagation when misapplying gathering).
5. Physical and Statistical Modeling in Applications
Spatially-varying convolution operators are foundational in:
- Imaging and Optics: Modeling sensor- and medium-dependent PSFs, defocus, scattering, and deblurring/denoising under spatially inhomogeneous blur (e.g., light-sheet microscopy (Toader et al., 2021), astronomical imaging (Hartung et al., 2012), defocus removal (Karaali et al., 2022)).
- Spatial Statistics: Nonstationary Gaussian process models for geostatistical data, allowing local anisotropy, heteroskedasticity, and directionally-varying dependencies (Fouedjio et al., 2014, Risser et al., 2015, Neto et al., 2012).
- Geometric Deep Learning: Mesh/graph convolution operators leveraging spatially-varying (non-template) kernels, facilitating localized, detail-preserving autoencoding and generation on surfaces, tetrahedral, or non-manifold structures (Zhou et al., 2020).
- Vision and Segmentation Pipelines: Video segmentation with dynamic, pixelwise spatially-varying filter generation for fusing multi-frame or multi-cue predictions (Lee et al., 2021).
Examples of the improvement enabled by spatially-varying convolution include superior deblurring accuracy (e.g., 27.8 dB PSNR with CoordGate-U-Net vs. ≤26.9 dB for fixed and coordinate-concat models (Howard et al., 2024)); nonstationary GP models with up to 22% RMSE improvement over stationary alternatives in geostatistical prediction (Fouedjio et al., 2014); and parameter savings up to 30-fold over per-location locally-connected layers in mesh autoencoders (Zhou et al., 2020).
6. Generalization and Practical Considerations
Spatially-varying convolution enables:
- Generalization across topologies and geometries: Through basis and local coefficient designs, spatially-varying operators can be applied to regular grids, arbitrary graphs, meshes (surface, volumetric, non-manifold) without requiring geometry template or manifoldness (Zhou et al., 2020).
- Parameter and runtime trade-offs: Efficient basis decompositions, CKMLP-based spatial gating, and sparse kernel interpolation offer a favorable balance between expressivity and computational cost, enabling real-time applications on high-resolution data (e.g., sub-2 ms per 1080p frame for sparse kernel complex (Wu et al., 4 Dec 2025)), in contrast to the prohibitive cost of unstructured, per-location kernels.
- Integration in neural architectures: Several frameworks (e.g., CoordGate, sparse kernel complex) are differentiable and amenable to end-to-end learning, allowing tight coupling of spatially-varying filtering with upstream or downstream tasks in deep architectures.
A plausible implication is that continued advances in the parameter-efficient, learnable, and generalizable spatially-varying convolution operators will further bridge the gap between physical modeling, statistical inference, and deep learning across spatially heterogeneous domains.
7. Limitations, Open Problems, and Future Directions
Current limitations include:
- Scalability to truly unstructured, sample-dependent topologies: Most efficient approaches rely on a fixed topology per dataset or precomputed neighborhood; real-time adaptation to dynamic graph structures remains challenging.
- Lack of universal, fast approaches for exhibiting arbitrary (non-parametric, high-dimensional) variability, especially in large, high-dimensional domains where neither gathering nor scattering basis projections is sufficient.
- Difficulties in interpretability and regularization: With high modeling flexibility comes risk of overfitting without careful architectural or regularization design, particularly in statistical and scientific imaging settings.
Ongoing research aims to unify representation-theoretic, kernel-interpolation, and neural-based methods to efficiently approximate complex spatially-varying operators and quantifying their statistical and computational properties in real-world, data-driven scenarios.
References
- (Chimitt et al., 2023): Scattering and Gathering for Spatially Varying Blurs
- (Zhou et al., 2020): Fully Convolutional Mesh Autoencoder using Efficient Spatially Varying Kernels
- (Howard et al., 2024): CoordGate: Efficiently Computing Spatially-Varying Convolutions in Convolutional Neural Networks
- (Wu et al., 4 Dec 2025): Efficient Spatially-Variant Convolution via Differentiable Sparse Kernel Complex
- (Risser et al., 2015): Local likelihood estimation for covariance functions with spatially-varying parameters: the convoSPAT package for R
- (Fouedjio et al., 2014): A Generalized Convolution Model and Estimation for Non-stationary Random Functions
- (Hartung et al., 2012): GPU Acceleration of Image Convolution using Spatially-varying Kernel
- (Mitchel et al., 2020): Efficient Spatially Adaptive Convolution and Correlation
- (Karaali et al., 2022): SVBR-NET: A Non-Blind Spatially Varying Defocus Blur Removal Network
- (Lee et al., 2021): GSVNet: Guided Spatially-Varying Convolution for Fast Semantic Segmentation on Video
- (Toader et al., 2021): Image reconstruction in light-sheet microscopy: spatially varying deconvolution and mixed noise
- (Neto et al., 2012): Accounting for spatially varying directional effects in spatial covariance structures
- (Henriques et al., 2016): Warped Convolutions: Efficient Invariance to Spatial Transformations