Neural Gaussians: Theory & Applications

Updated 30 January 2026

Neural Gaussians are a class of methods that integrate Gaussian primitives into neural networks to enable continuous, anisotropic scale-space analysis.
They offer explicit representations, such as 3D Gaussian splatting and mesh-anchored Gaussians, to improve signal reconstruction and scene rendering.
Drawing on infinite-width limits and Gaussian process theory, Neural Gaussians facilitate adaptive modeling in scientific computing and non-rigid reconstruction.

Neural Gaussians are a family of methodologies and theoretical constructs that integrate Gaussian functions, mixtures, or scale spaces into the parameterization, representation, or analysis of neural networks and neural fields. They span signal processing, scientific computing, probabilistic modeling, neural rendering, scene understanding, and infinite-width theory. The defining characteristic is the use of Gaussian primitives—kernels, splats, mixtures, processes, or nonlinearities—as essential units, either as explicit parametric representations or as limits of neural architectures.

1. Mathematical Foundations: Scale-Space Fields and Gaussian Modulation

The neural Gaussian scale-space field, as formalized in "Neural Gaussian Scale-Space Fields" (Mujkanovic et al., 2024), constructs an efficient function $F(x,\Sigma) : \mathbb{R}^{d_i} \times \mathbb{R}^{d_i \times d_i} \to \mathbb{R}^{d_o}$ such that querying at spatial position $x$ and covariance $\Sigma$ yields the Gaussian convolution $f_{\Sigma}(x) = (G_{\Sigma} * f)(x)$ of a signal $f$ :

$G_{\Sigma}(\tau) = (2\pi)^{-d_i/2} |\Sigma|^{-1/2} \exp\left(-\frac{1}{2} \tau^\top \Sigma^{-1} \tau\right)$

No manual convolutions or hand-crafted scale pyramids are used. Instead, $F$ is realized by Fourier feature modulation paired with a global Lipschitz-bound MLP. For anisotropic scaling, a set of modulation weights $\lambda_i(\Psi) = \exp(-\sqrt{a_i^\top \Psi a_i})$ and frequency vectors $a_i$ induce a dampened positional encoding:

$\gamma_{\mathrm{mod}}(x; \Psi)_{2i-1} = \lambda_i(\Psi) \cos(2\pi a_i^\top x),\;\; \gamma_{\mathrm{mod}}(x; \Psi)_{2i} = \lambda_i(\Psi) \sin(2\pi a_i^\top x)$

This encoding is mapped through a fully connected, Lipschitz-bounded MLP, with weights parameterized via skew-symmetric exponentials and singular value constraints to ensure $\mathrm{Lip}(K) \leq 1$ . Training, performed self-supervised by regressing $f(x)$ at random $x$ and random pseudo-covariances $\Psi$ , enforces that $F(x,\Psi)$ behaves as smoothed versions $f_\Sigma(x)$ for arbitrary $\Sigma$ .

Calibration aligns pseudo-covariance $\Psi$ to physical covariance $\Sigma$ post-training via geometric mean regression over Monte Carlo sample comparisons. This approach supports fully continuous, anisotropic Gaussian scale-space queries for signals of arbitrary dimensionality, tested on modalities such as images $(d_i=2)$ , SDFs $(d_i=3)$ , and 4D light-stage data.

2. Probabilistic and Infinite-Width Connections: Gaussian Processes and Mixtures

Neural Gaussians appear as limiting distributions in random and quantum neural networks. For classical networks, the infinite-width limit under suitable i.i.d. initialization yields function distributions converging to Gaussian processes (GPs) with kernels induced by the architecture and activation (Guo, 2021). Formally, for deep networks with widths $N_l \rightarrow \infty$ , network outputs $y_i(x)$ converge to $\mathrm{GP}(h^{[L+1]}, k^{[L+1]})$ where the kernel recursion is

$k^{[l]}(x, x') = \sigma_w^{2[l]} \mathbb{E}_{z^{[l-1]}, b^{[l-1]}}\left[\phi(z^{[l-1]}(x) + b^{[l-1]}) \phi(z^{[l-1]}(x') + b^{[l-1]})\right]$

This underpins the Barron space interpretation, wherein the union of all RKHSs induced by two-layer nets equals the Barron space $\mathcal{B}_2(\Omega)$ .

Quantum neural networks also converge to GPs in the limit of large Hilbert space ( $d \to \infty$ ), with covariance scaling set by the state overlap (García-Martín et al., 2023). This phenomenon persists after gradient-based training under the NTK regime, provided barren plateaus are avoided (Girardi et al., 2024).

If only the last hidden layer widens, deep feedforward networks converge to centered Gaussian mixtures, not simple Gaussians; further widening collapses the mixture randomness (Asao et al., 2022). This delineates the neural–GP connection as strictly infinite-width, with Gaussian mixtures as principal finite-width generalizations.

3. Explicit Parametric Representations: Splatting, Meshes, Deformations

Neural Gaussians are deployed as explicit parameterizations for scene and signal representations:

3D Gaussian Splatting: Each primitive is an anisotropic ellipsoid with mean position $\mu$ , covariance $\Sigma=R S S R^\top$ , color $c$ , and opacity $\alpha$ (Jiang et al., 2023, Fan et al., 2024). Rendering involves projecting Gaussians onto the image plane and compositing colors via $\alpha$ -blending, often with advanced shading (specular, diffuse, residual SH) and normal estimation from axis directions.
Super-Gaussians for segmentation and language: SuperGSeg aggregates dense neural Gaussians into structured, sparse "Super-Gaussians" enabling high-dimensional feature distillation and open-vocabulary segmentation without excessive GPU cost (Liang et al., 2024).
Mesh-Anchored 2D Neural Gaussians: Tessellation GS constrains Gaussians to mesh faces, using hierarchical neural features and adaptive subdivision optimized by photometric, geometric, and regularization losses, with all Gaussians deforming according to mesh motion induced by learned skinning fields (Tao et al., 8 Dec 2025).
Dynamic-Scene Decomposition: DriveSplat performs region-wise voxel initialization and dynamic-static decoupling, with non-rigid actors controlled via deformation networks acting on Gaussian primitives, further supervised by depth/normal priors (Wang et al., 21 Aug 2025).
Monocular non-rigid reconstruction: Neural Parametric Gaussians model deformation by coarse parametric point cloud and local Gaussian splats, yielding high-fidelity view synthesis with explicit temporal regularization (Das et al., 2023).

Neural Gaussian primitives are ubiquitously factorized for efficiency and high-frequency fidelity (e.g. explicit SH color, anisotropic scale along principal axes, local frame anchoring).

4. Gaussian Mixtures and Modules: Approximators and Nonlinearities

Gaussian Mixtures serve both as universal approximators and as trainable modules in neural architectures:

Gaussian Mixture Layers: As a parametric alternative to mean-field theory, networks can implement layers whose activations are Gaussian mixture expectations:

$h_{\mu, \Sigma}(x) = \frac{1}{K}\sum_{k=1}^{K} \mathbb{E}_{(\omega, \beta) \sim \mathcal{N}(\mu_k, \Sigma_k)} [\omega\,\mathrm{ReLU}(\beta^\top x)]$

These layers possess Wasserstein-gradient-flow training dynamics in the mixture parameter space, realized as standard Euclidean gradients over mixture means and covariances (Chewi et al., 6 Aug 2025). Compared to empirical mean-field limits, GM-layers exhibit non-lazy, feature-learning dynamics and competitive generalization on vision benchmarks.

Trainable Gaussian Mixture Modules (GMNM): Modern networks can replace conventional nonlinearities (ReLU, Sigmoid) by unconstrained, differentiable mixtures of Gaussian kernels. This involves stacking learnable projections and forming admixtures:

$G(x) = \sum_{i=1}^m \pi_i \exp\Big(-\frac{1}{2} \left[\alpha_i^T(A_i(x-\mu_i) + b_i) + \beta_i\right]^2\Big)$

No probabilistic constraints are imposed; modules are plug-and-play drop-ins for MLPs, CNNs, attention, and LSTMs, enabling robust adaptation across modalities and improved function approximation (Lu et al., 8 Oct 2025). Empirical results demonstrate superior test accuracy across function regression, image, and timeseries tasks.

5. Physics-Informed and Adaptive Meshes: Scientific Computing

Physics-Informed Gaussians (PIGs) employ adaptive Gaussian feature embeddings as parametric mesh representations for PDE solvers (Kang et al., 2024). Each basis function is a trainable Gaussian:

$G_i(x) = \exp\left(-\frac{1}{2}(x-\mu_i)^\top \Sigma_i^{-1} (x-\mu_i)\right)$

Amplitudes, centers, and covariances are fully learned, and their linear combinations are processed by compact MLPs. PINN-style losses enforce governing equations, boundary conditions, and regularity. Adaptive placement ensures the Gaussian mesh migrates toward regions with high residual error, consistently outperforming MLP-based PINNs and fixed mesh methods, especially on problems exhibiting high-frequency or nonlinear features. Strengths include mesh-free adaptivity, parameter efficiency, and universal approximation guarantees. Limitations include per-iteration cost scaling with $N$ and fixed $N$ per problem.

6. Gaussian Process Neurons and Correlated Stochastic Models

Neural Gaussians also appear as stochastic activation functions ("Gaussian Process Neurons" or GPNs), where each neuron's nonlinearity is modeled as a learnable GP (Urban et al., 2017):

$f(a) \sim \mathcal{GP}(m(a), k(a, a'))$

Variational Bayesian inference enables joint learning of activations and network weights, with closed-form priors and deterministic losses for mini-batch SGD. GPNs propagate mean and variance through layers, providing intrinsic uncertainty estimates and competitive performance versus deep GPs and dropout-regularized NNs. Full mean/covariance propagation is feasible in convolutional and recurrent architectures via patch-wise or time-step-wise computation.

7. Neural Operators and Function-Valued Gaussian Processes

Infinite-width neural operators—networks acting between function spaces—converge to Gaussian processes over functionals $Z(f)(x)$ under Gaussian-distributed kernels and linear weights (Souza et al., 19 Oct 2025). This is formalized in both Fourier Neural Operators and Matérn-prior neural operators. Covariance functions in the operator-GP correspond to spectral overlap for FNOs:

$c_{A_k}(f_1, f_2; z, z') = \sigma_k^2 (2\pi)^{2d_x} \sum_{|s| \leq B} \mathrm{FS}_{-s}[f_2] \mathrm{FS}_s[f_1] e^{-i s \cdot (z-z')}$

Posterior inference and uncertainty quantification in regression tasks then use classical GP machinery over these functional covariances, efficiently handling PDE operators and general operator learning.

Neural Gaussians thus provide an integrative vocabulary for parametric signal representation, probabilistic analysis, nonlinearity design, adaptive scientific computing, and scalable operator learning. Their usage ranges from explicit splatting primitives and mesh-based field parameterizations to implicit stochastic activations and deep theoretical connections to infinite-width limits and function space kernels. The breadth of applications reflects their versatility and foundational role across current neural modeling paradigms.