- The paper introduces an uncertainty-aware 3D Gaussian splatting method that learns per-splat appearance variance to enhance reliability in RGB-D SLAM.
- The methodology jointly optimizes 3D Gaussian parameters and camera poses while propagating per-pixel uncertainty to improve tracking, submap registration, and loop closure.
- Experimental results demonstrate state-of-the-art tracking accuracy with reduced drift and high reconstruction fidelity across synthetic and real-world datasets.
VarSplat: Uncertainty-Aware 3D Gaussian Splatting for Robust RGB-D SLAM
Introduction and Motivation
VarSplat introduces an uncertainty-aware formulation for dense RGB-D SLAM, leveraging 3D Gaussian Splatting (3DGS) to address the persistent challenge of measurement reliability in neural SLAM. The methodology targets scenarios where existing 3DGS-SLAM systems typically falter, namely in areas with weak texture, reflective materials, or depth discontinuities. By learning per-splat appearance variance and propagating it via the law of total variance into a differentiable per-pixel uncertainty map, VarSplat provides pixel- and region-level reliability estimates that directly inform tracking, submap registration, and loop closure modules. This design enables both stability and high-fidelity map reconstructions in real environments, without reliance on auxiliary uncertainty predictors or handcrafted reliability heuristics.
Figure 1: VarSplat architecture—each 3D Gaussian learns position, appearance, and variance, with per-splat variances composited into per-pixel uncertainty used throughout the SLAM pipeline.
Methodological Framework
Uncertainty Quantification in 3DGS
VarSplat augments each 3D Gaussian with a learned appearance variance parameter σ2, distinguished from geometric covariance to model uncertainty in the predicted color across viewpoints. During compositing, the per-splat variances are aggregated into a differentiable per-pixel variance V using the law of total variance. The derived formula incorporates both alpha compositing and second-order moments, ensuring that uncertainty is sensitive to local geometric and photometric instability, such as at occlusion or depth discontinuity boundaries.
This per-pixel uncertainty is rendered in a single forward pass with the rasterizer, preserving computational efficiency and enabling its ubiquitous use throughout the SLAM optimization pipeline.
Joint Optimization
VarSplat’s pipeline maintains a collection of submaps, each consisting of 3D Gaussians whose parameters—mean, scale, orientation, color, opacity, and variance—are optimized jointly with camera poses. The composited uncertainty V contributes to the loss during mapping via a Gaussian negative log-likelihood, while also serving as a confidence weight in downstream pose refinement. Special attention is paid to ensuring that variance parameters are only updated in the mapping stage, preventing gradient conflicts especially in online scenarios.
Downstream SLAM Subsystems
Tracking and Registration with Uncertainty
Rendered per-pixel uncertainty maps V are utilized as reliability-aware weights in optimizing photometric residuals for frame-to-frame tracking, as well as in submap alignment during loop closure and registration. A normalization procedure based on median log-scaling ensures that high-variance (i.e., low-confidence) regions are down-weighted during pose estimation, improving convergence and reducing drift. The per-splat variance further serves as a summary statistic for submap reliability in loop detection, modulating submap similarity to filter out unreliable map overlaps.
Figure 2: Ablation on ScanNet (scene0181)—uncertainty disables tracking artifacts, drift, and ghosted submaps, leading to smooth aligned trajectories with VarSplat.
Experimental Evaluation
Datasets and Baselines
Evaluation spans the Replica (synthetic), TUM-RGBD, ScanNet, and ScanNet++ (real-world) datasets. VarSplat is compared against recent neural implicit and 3DGS-SLAM systems, including SplaTAM, MonoGS, Gaussian-SLAM, CG-SLAM, and LoopSplat, as well as uncertainty-aware variants such as Uni-SLAM.
Numerical Results
VarSplat demonstrates consistently improved or SOTA tracking accuracy (as measured by keyframe ATE RMSE) across all benchmarks. For example, on ScanNet++, it yields an 18% reduction in ATE RMSE compared to the second-best method, and avoids catastrophic drift where others fail. On TUM-RGBD, it maintains robust tracking in the presence of incomplete or noisy depth input.
Reconstruction quality, as assessed by depth L1 and mesh F1 score, is preserved at the level of state-of-the-art methods, despite the additional regularization imposed by the learned variance. Notably, rendering quality (PSNR/SSIM/LPIPS) is not compromised, with VarSplat achieving competitive or best-in-class scores for both input and novel view synthesis.
Figure 3: Spatial uncertainty adapts to scene structure—uncertainty is high on untextured walls or transparent regions, while map growth and alignment focus on reliable domains.
Figure 4: Per-pixel uncertainty with and without depth cues on TUM-RGBD—depth-aware uncertainty concentrates on unconstrained or ambiguous surfaces, filtering out overconfident predictions.
Ablation and Runtime
Ablation studies confirm the necessity of applying uncertainty at all three key stages (tracking, loop detection, registration) for optimal trajectory stability. The variance learning mechanism, using squared residuals in the loss, ensures consistent uncertainty estimation and prevents the overconfidence seen with alternative formulations. Runtime analysis shows that VarSplat maintains near real-time inference comparable to other 3DGS methods, with the variance-aware rendering introducing only modest overhead.
Theoretical and Practical Implications
VarSplat’s explicit appearance-based uncertainty quantification offers a parsimonious and principled approach to reliability estimation, and avoids the pitfalls of ad hoc or pretrained confidence predictors. This design enables robust online SLAM in unstructured, dynamic, and low-visibility environments. The methodology can be extended to joint geometric-appearance uncertainty learning, dynamic scene segmentation, or even as a proxy for uncertainty calibration in autonomous robotics and AR/VR applications.
From a theoretical perspective, the integration of the law of total variance into differentiable rasterization within a neural SLAM framework closes the gap between probabilistic modeling and efficient, real-time visual localization and mapping.
Conclusion
VarSplat establishes a new paradigm for neural SLAM by integrating learned, differentiable uncertainty into the core of 3D Gaussian Splatting. By leveraging per-splat appearance variance and efficient rasterization, it achieves robust, drift-resistant camera tracking and registration without sacrificing rendering fidelity or real-time performance. The framework is generalizable and suggests pathways for richer uncertainty modeling in future SLAM systems, with immediate practical benefit in robotics and embodied AI.
(2603.09673)