Spatio-Temporal 3D Gaussian Representation

Updated 14 February 2026

Spatio-Temporal 3-D Gaussian Representation is a framework that models evolving 3-D fields using explicit and stochastic Gaussian functions.
It utilizes both parametric Gaussian primitives and non-separable stochastic kernels to capture dynamic behavior in fields for applications such as rendering, statistics, and physical simulations.
The approach enables efficient inference through state-space filtering, finite element methods, and neural optimization, supporting real-time, high-fidelity dynamic scene reconstructions.

A spatio-temporal 3-D Gaussian representation models physical, statistical, or perceptual fields over three dimensions where at least one is temporal (e.g., $\mathbb{R}^2 \times \mathbb{R}$ for 2D+time, or $\mathbb{R}^3 \times \mathbb{R}$ for full 3D+time) using Gaussian functions or processes. In the statistical literature, this can refer to kernels, random fields, or GMRFs; in machine vision and graphics it increasingly refers to explicit or neural 3D Gaussian “blobs” with temporally evolving parameters. Such representations are foundational in spatio-temporal PDEs, non-separable covariance modeling, Gaussian process regression, dynamic reconstruction, and neural rendering.

1. Mathematical Formulations

Spatio-temporal 3-D Gaussian representations are expressed both as explicit, parameterized Gaussian functions and as stochastic processes.

1.1. Parametric Gaussian Primitives

A generic time-varying 3D Gaussian (“blob”) is

$G(x, t) = c \exp\Big(-\frac{1}{2}(x-\mu(t))^\top \Sigma(t)^{-1}(x-\mu(t))\Big)$

where $x \in \mathbb{R}^3$ , $t \in \mathbb{R}$ , $\mu(t) \in \mathbb{R}^3$ is a time-dependent mean, $\Sigma(t)$ is a time-dependent positive-definite covariance (often decomposed $\Sigma(t) = R(t)S(t)S(t)^\top R(t)^\top$ , with rotation $R(t)\in SO(3)$ and diagonal scale $S(t)$ ), and $c$ is a normalization constant (Yao et al., 10 Jul 2025). Anisotropic spatio-temporal Gaussians can use

$G(x, t) = w \exp \left[ -\frac{1}{2}(x - \mu_x)^\top \Sigma^{-1}(x - \mu_x) - \frac{1}{2}\frac{(t - \mu_t)^2}{\sigma_t^2} \right]$

to support explicit temporal localization (Zhou et al., 29 Jun 2025).

1.2. Stochastic Process Models

For large-scale or continuous fields,

$Y(s, t) \sim \text{GP}(0, C((s, t), (s', t')))$

with $C$ a suitable (possibly non-separable) covariance function. The classical separable form is $C = k_s(s, s')k_t(t, t')$ ; non-separable spatio-temporal kernels (e.g. Gneiting class, Whittle–Matérn) are widely used for realistic modeling (Ma et al., 2017, Azevedo et al., 2020, Lindgren et al., 2020).

SPDE-driven fields such as the diffusion-based, non-separable Matérn model satisfy

$[ -\gamma_t^2 \partial_{tt} + L_s^{\alpha_s} ]^{\alpha_t/2} u(s,t) = \mathcal{W}_Q(s,t)$

on domain $D \times \mathbb{R}$ , where $L_s = \gamma_s^2 - \Delta$ and $\mathcal{W}_Q$ is Gaussian noise with spatial precision (Lindgren et al., 2020).

2. Construction and Parameterization Strategies

Multiple disciplines employ Gaussian representations, each with characteristic construction, parameterization, and computational approaches.

Domain	Representation	Parameterization/Inference
Dynamic rendering	Sets of moving 3D Gaussians	Deformation fields, anchor grids, masks, MLPs
Spatial statistics	Stochastic (G)MRFs or GPs	Covariance/precision matrices, SPDEs, copulas
Fluid mechanics	Weighted Gaussian basis for PDEs	Moment updates, operator splitting, projection

2.1. Dynamic Scene and Neural Representation

Contemporary dynamic scene models leverage “neural” Gaussian representations. For example, “SD-GS” (Yao et al., 10 Jul 2025) utilizes a deformable anchor grid, grouping Gaussians into anchors whose parameters evolve via small MLPs conditioned on both anchor identity and temporal embeddings. The underlying scene model is highly memory efficient, as only anchor features are stored and Gaussians are generated on-the-fly per frame.

Hierarchical gating, mask-based pruning, and coupled neural deformation fields are employed to model and compress spatio-temporal Gaussian splats (Li et al., 28 May 2025, Javed et al., 2024). Dynamic backgrounds are often distinguished from objects via clustering of appearance and motion features, sometimes with additional supervision from event-based sensors (see STD-GS (Zhou et al., 29 Jun 2025)).

2.2. Spatio-Temporal Random Fields

Statistical GMRF and GP frameworks for spatio-temporal processes construct Gaussian fields on regular lattices (or FEM meshes), commonly specifying the joint law via a sparse precision (inverse covariance) matrix reflecting local Markov structure (Azevedo et al., 2020). The diffusion-based Matérn extension of Lindgren et al. expresses fields as solutions to fractional SPDEs, yielding models with explicit parameters for spatial and temporal smoothness, ranges, variance, and degree of nonseparability (Lindgren et al., 2020). These models are naturally implemented via finite element methods and straightforwardly incorporated into Bayesian latent Gaussian frameworks (e.g. R-INLA).

Grid-free or operator-based methods, including explicit Gaussian mixture representations for evolutionary PDEs, employ time-varying weights, means, and covariances, coupled with time discretization, ODE integration, and projection or optimization for physics constraints (Xing et al., 2024, Zhang et al., 1 Dec 2025).

3. Covariance Structure, Non-Separability, and Invariance

Spatio-temporal Gaussian representations support a wide range of dependence structures; non-separability is critical in capturing realistic dynamics.

Separable kernels: The form $C((x, t), (x', t')) = k_x(x, x')k_t(t, t')$ allows tensor decomposition and efficient computation but cannot represent “tilted” correlations arising from advection or diffusion (Lindgren et al., 2020).
Non-separable models: Classes such as the Gneiting kernel, Whittle–Matérn, or convolution-generated SPDEs allow arbitrary interactions, encoded via positive-definite covariance/precision matrices or differential operators with both space and time dependencies (Zhang et al., 1 Dec 2025, Ma et al., 2017).
Invariant receptive fields: In computer vision, affine Gaussian derivative models (Lindeberg, 2023) form receptive field families whose scale- and affine-normalized derivatives are provably covariant under local affine, Galilean, and temporal transformations. This theoretical underpinning ensures robustness under geometric and photometric distortions in natural scenes.

4. Inference, Learning, and Computational Implementations

The estimation and deployment of spatio-temporal 3-D Gaussian representations involve various computational strategies aligned with the target application.

State-space filtering and KFs: For separable temporal processes with rational spectral densities, exact finite-dimensional state-space models can be built for efficient recursive filtering (Todescato et al., 2017). Kalman Representer theorems guarantee that filtered state vectors suffice for minimum variance prediction throughout the spatial domain.
Bayesian sparse models: Additive decompositions (e.g., AAGP (Ma et al., 2017)) and transformed GMRFs (Azevedo et al., 2020) exploit low-rank, separable, and non-separable components, with efficient hierarchical Gibbs/Metropolis samplers or INLA approximations. Precisions are built via Kronecker sums of spatial, temporal, and joint adjacency graphs.
Finite element and Kronecker representations: Solutions to stochastic PDEs over triangulated domains afford flexible modeling—marginal spatial covariance is Matérn, marginal temporal covariance is a hypergeometric function or Matérn in separable cases. Discretization yields sparse Kronecker-structured precision matrices for fast inference (Lindgren et al., 2020).
Neural optimization and real-time rendering: Dynamic scene methods perform joint optimization over neural network weights, anchor features, deformation fields, and mask/attention parameters. Learnable quantization, trajectory keypoint reduction, and mask-based pruning yield compressed yet accurate models for low-latency applications (Javed et al., 2024, Oh et al., 19 May 2025).

5. Applications and Empirical Performance

Spatio-temporal 3-D Gaussian models are widely adopted across various disciplines and tasks:

Dynamic scene reconstruction: Real-time rendering frameworks (e.g., SD-GS, STDR, hybrid 3D-4DGS) deliver high-fidelity, temporally-coherent synthesis of dynamic environments with substantial reductions in model size (60%+) and increases in rendering speed (100%+) compared to prior methods, through flexible allocation and adaptive pruning of Gaussian primitives (Yao et al., 10 Jul 2025, Oh et al., 19 May 2025, Javed et al., 2024).
Environmental spatio-temporal statistics: SPDE-driven Matérn fields and AAGP models yield interpretable and computationally tractable spatial-temporal inference, supporting Bayesian spatial data analysis at global scale (Ma et al., 2017, Lindgren et al., 2020).
Physics and simulation: Grid-free Gaussian mixtures efficiently solve PDEs for fluid mechanics and advection-diffusion equations, with enhanced preservation of critical structures (e.g., vorticity) and significant computational savings over Eulerian grids and implicit neural representations (Xing et al., 2024, Zhang et al., 1 Dec 2025).
CT and medical imaging: Spatio-temporal Gaussian fields refined via hash-encoded neural networks and attention yield state-of-the-art reconstruction, especially under highly undersampled tomographic regimes (Zhong et al., 12 Feb 2026).
Computer vision and receptive fields: Affine Gaussian models serve as the foundation for canonical, transformation-covariant spatio-temporal image analysis (Lindeberg, 2023).

6. Extensions, Theoretical Guarantees, and Future Directions

Spatio-temporal 3-D Gaussian representations are characterized by several extensibility and theoretical features:

Support for curvature and nonstationarity: The diffusion-based Matérn SPDE framework accommodates curved manifolds (e.g., global climate modeling on the sphere) and non-stationary spatial operators, leveraging triangulated FE representations for inference (Lindgren et al., 2020).
Parameter interpretability: Range, smoothness, variance, and non-separability parameters provide physical interpretability and facilitate prior specification, with explicit formulas mapping between operator parameters and marginal correlations.
Model flexibility: Mix-and-match kernel compositions, such as additive, product-sum, or copula approaches, allow the integration of domain knowledge and heterogeneous dependencies (Ma et al., 2017, Azevedo et al., 2020).
Software and practical deployment: Sparse Kronecker representations and FE approximations are now widely implemented in platforms such as R-INLA and PyTorch-based neural rendering engines.

A key future direction is the integration of physics-driven, non-stationary, and hierarchical models with neural or explicit Gaussian splatting pipelines, enabling real-time, interpretable, and physically plausible dynamic field reconstruction at scale.