Differentiable Gaussian Splatting

Updated 10 January 2026

Differentiable Gaussian Splatting is a neural rendering approach that represents scenes with spatially organized, anisotropic Gaussian splats, all optimized via gradient descent.
It enables real-time, geometry-aware synthesis and multimodal rendering by incorporating analytic gradients through the entire rendering pipeline.
The method achieves superior photometric and geometric fidelity through techniques like weighted sum rendering, hardware acceleration, and advanced regression models.

Differentiable Gaussian Splatting defines a family of neural rendering algorithms in which a scene is represented by spatially organized, anisotropic Gaussian distributions (“splats”) and where both the forward rendering process and the underlying representation are optimized via backpropagation. This approach offers analytic gradients through the entire rendering pipeline and provides real-time, geometry-aware novel view and multimodal synthesis capabilities for complex scenes from sparsely observed 2D images. Recent advances extend the differentiability to new domains, including discontinuity-aware boundary handling, hardware-accelerated backprop, dynamic 4D field generation, and multimodal rendering based on geometric consistency and fusion of modalities.

1. Gaussian Splatting Representation and Differentiability

In its canonical formulation, a 3D scene is modeled as a set of $N$ splats, each parameterized by mean $\boldsymbol{\mu}_i \in \mathbb{R}^3$ , positive definite covariance $\Sigma_i \in \mathbb{R}^{3\times 3}$ , learned opacity $\alpha_i$ , and appearance attributes (e.g., color via spherical harmonic coefficients) (Hou et al., 2024, Xie et al., 14 Oct 2025). The density for each splat is:

$G_i(\boldsymbol{x}) = \alpha_i \exp\left( -\frac{1}{2} (\boldsymbol{x} - \boldsymbol{\mu}_i)^\top \Sigma_i^{-1} (\boldsymbol{x} - \boldsymbol{\mu}_i) \right)$

and the image formation process involves projecting these splats into image space via camera projection and compositing their contributions using (often non-commutative) alpha-blending or, as in Weighted Sum Rendering (WSR), learned commutative weights (Hou et al., 2024).

Crucially, all parameters—including positions, covariance matrices, opacities, color coefficients, and sometimes wave parameters—are learned via gradient descent. The closed-form structure of the splatting kernel allows for analytic derivatives through projection, blending, and even geometric regularization (Daniels et al., 18 Nov 2025, Xie et al., 14 Oct 2025, Huang et al., 2024).

2. Forward and Backward Passes: Differentiable Rendering Pipelines

The differentiable forward pass projects each Gaussian from world coordinates to screen space, obtaining a 2D mean and covariance via an affine or perspective-correct transform (e.g., $\mu_{2D} = P W \mu_i$ , $\Sigma_{2D} = J W \Sigma_i W^\top J^\top$ ), after which rasterization is performed either by tiling, hardware rasterization, or analytic ray-splat intersection (Yuan et al., 24 May 2025, Gu et al., 2024).

Alpha blending for compositing can be performed:

$C(\mathbf{p}) = \sum_i c_i \beta_i(\mathbf{p}) \prod_{j < i}(1 - \beta_j(\mathbf{p}))$

or, in WSR:

$C(\mathbf{r}) = \frac{c_B w_B + \sum_i c_i \alpha_i w(d_i)}{w_B + \sum_i \alpha_i w(d_i)}$

where $w(d_i)$ is a learned depth-dependent attenuation and all blending steps are differentiable (Hou et al., 2024).

The backward pass propagates loss gradients through all parameters. For hardware efficiency, differentiated rendering may use quad-level or subgroup atomic reductions, programmable fragment shader interlocks, or mixed precision (Yuan et al., 24 May 2025). Gradients for positions and covariances follow chain-rule traversal:

$\frac{\partial G_i}{\partial \boldsymbol{\mu}_i} = G_i \Sigma_i^{-1} (\mathbf{x} - \boldsymbol{\mu}_i)$

and

$\frac{\partial G_i}{\partial \Sigma_i} = \frac{1}{2} G_i \Sigma_i^{-1} (\mathbf{x} - \boldsymbol{\mu}_i)(\mathbf{x} - \boldsymbol{\mu}_i)^\top \Sigma_i^{-1}$

with all blending steps supporting analytic differentiation.

3. Extensions for Geometry, Boundaries, and Multimodal Rendering

Extensions such as UniGS (Xie et al., 14 Oct 2025) and TriaGS (Tran et al., 6 Dec 2025) optimize for not only RGB appearance but also depth, normals, and semantics. UniGS performs differentiable ray-ellipsoid intersection for accurate depth, then computes surface normals from depth via finite differences:

$\mathbf{n} = \text{normalize}\left([\partial_x \mathbf{P}(u,v)] \times [\partial_y \mathbf{P}(u,v)]\right)$

and backpropagates all losses (photometric, SSIM, geometric, segmentation, pruning) into splat parameters.

Geometry consistency is enforced by triangulation-guided losses as in TriaGS, penalizing the deviation between rendered world points and multi-view consensus points reconstructed by linear triangulation and SVD. The geometric loss:

$L_T = \sum_{\text{pixels}} \frac{||X_r - X^*||^2}{||X_r - X^*||^2 + \sigma^2}$

regularizes the optimization towards globally robust surfaces and eliminates artifact "floaters".

Discontinuity-Aware Boundaries

DisC-GS introduces parameterized Bézier boundaries for each splat to enable sharp edge representation. For each pixel, the Gaussian is "scissored" by a polynomial indicator $g_{sc}(p)$ constructed from cubic Bézier control points and the implicit equation of the curve. Since the indicator is non-differentiable on the boundary, DisC-GS adopts a gradient approximation, solving for the minimal control point shift required to change the indicator's state for the pixel, propagating the update by finite difference (Qu et al., 2024).

4. Dynamism, Sonar Fusion, and Novel Modalities

Dynamic Gaussian Flow

GaussianFlow extends differentiable splatting to dynamic 4D fields, directly relating motion of 3D Gaussian parameters to pixelwise optical flow. For Gaussian $i$ , the flow generated at pixel $\mathbf{x}$ between frames $t_1$ and $t_2$ is:

$\text{flow}^{G}_{i, t_1 \rightarrow t_2}(\mathbf{x}) = \bar{B}_{i, t_2}\bar{B}^{-1}_{i, t_1}(\mathbf{x} - \bar\mu_{i,t_1}) + \bar\mu_{i,t_2} - \mathbf{x}$

and the aggregated flow field:

$\text{flow}^{G}_{t_1 \rightarrow t_2}(\mathbf{x}) = \sum_{i=1}^N w_i(\mathbf{x}) \, \text{flow}^{G}_{i, t_1 \rightarrow t_2}(\mathbf{x})$

This mechanism enables direct flow-consistency supervision, substantially reducing "color-drift" artifacts in dynamic content synthesis (Gao et al., 2024).

Sonar-Camera Fusion

Z-Splat (Qu et al., 2024) demonstrates that incorporating transient data (e.g., sonar) resolves the "missing cone" problem in scenes where camera-only reconstructions fail in the depth axis. By mapping sonar measurements into Z-axis or YZ-plane Gaussian projections, and fusing camera loss with sonar loss terms, the model achieves significantly improved geometry and photometry, e.g., up to 5 dB PSNR gain and 60% Chamfer reduction.

5. Hardware Acceleration, Scalability, and Practical Implementation

Efficient Differentiable Hardware Rasterization (Yuan et al., 24 May 2025) advances the field by implementing the forward and backward passes using GPU hardware blending, reducing reliance on tile-based rasterization and exploiting quad-level atomic reduction and fragment-shader interlock. Float16 and unorm16 render targets provide optimal speed–accuracy tradeoff, achieving $>3\times$ total speedup and $>$ 37 $\times$ memory reduction, with negligible image fidelity loss. This pipeline is essential for deployment on resource-constrained devices and large-scale scenes.

Weighted Sum Rendering (WSR) (Hou et al., 2024) further eliminates the expensive sorting step required by non-commutative alpha-blending, replacing it with commutative, learned weighted sums, and maintaining full differentiability for all parameters. WSR is $1.23\times$ faster and removes popping artifacts associated with order changes. Implementation leverages two-pass accumulation (color, weight), followed by normalization.

6. Advanced Splat Models and Regression Theory

Splat Regression Models (Daniels et al., 18 Nov 2025) provide a formal framework unifying Gaussian Splatting with mixtures of anisotropic bump functions, and offer a theoretical foundation for optimization methods by Wasserstein–Fisher–Rao gradient flows. The model’s functional optimization directly supports Euclidean and geometrically motivated gradients, e.g., for center $b_i$ and scale $A_i$ :

$\frac{\partial \mathcal{L}}{\partial b_i} = \frac{2}{n} \sum_{j=1}^n \left\langle r_j, v_i \right\rangle [ - \Sigma_i^{-1}(x_j - b_i) ] g_i(x_j) + 2\lambda_b b_i$

and

$\frac{\partial \mathcal{L}}{\partial A_i} = \frac{2}{n} \sum_{j=1}^n \left\langle r_j, v_i \right\rangle [ -A_i^{-T} - A_i^{-T}\Sigma_i^{-1}(x_j - b_i)(x_j - b_i)^\top A_i^{-T} ] g_i(x_j) + 2\lambda_A A_i$

with all steps implementable via automatic differentiation.

7. Evaluation, Limitations, and Future Directions

Benchmarking across recent works demonstrates that differentiable Gaussian Splatting frameworks achieve state-of-the-art performance in photometric fidelity, geometric accuracy, and multimodal (e.g., semantic, dynamic) scene synthesis (Tran et al., 6 Dec 2025, Xie et al., 14 Oct 2025, Gu et al., 2024, Gao et al., 2024). For instance, TriaGS achieves a mean Chamfer Distance of 0.50 mm on DTU, outperforming explicit surface methods, and DisC-GS improves SSIM and PSNR by $+0.02$ – $+0.03$ and $+1.0$ – $+1.8$ dB, respectively, over vanilla 3DGS.

Major limitations include computational cost for Monte Carlo sampling in inverse rendering (Gu et al., 2024), sensitivity to noisy or incomplete multi-view information, and unresolved challenges in long-term dynamism and multi-bounce light transport. Prospective advances are anticipated in learned importance sampling, hierarchical splat culling, multi-modal fusion, and further hardware specialization.

Overall, Differentiable Gaussian Splatting offers an extensible, analytically tractable paradigm for neural rendering, combining geometric flexibility, photorealistic synthesis, multimodal consistency, and compatibility with gradient-based learning for both small- and large-scale, static and dynamic scenes.