Robust 3DGS: Delayed Gaussian Growth
- Delayed Gaussian Growth Strategy is a technique for robust 3D reconstruction that postpones splitting or cloning Gaussians until a reliable static scene representation is achieved.
- It employs a principled densification schedule, using an explicit delay parameter and gradient-based criteria to target only consistently static regions.
- Empirical results demonstrate significant improvements in PSNR and SSIM by mitigating overfitting to transient artifacts and illumination variations.
The delayed Gaussian growth strategy is a core component of RobustSplat++, a system for robust 3D Gaussian Splatting (3DGS) in challenging, in-the-wild visual scenarios characterized by transient objects and illumination variations. This approach directly addresses failure modes observed in baseline 3DGS when densification—via splitting or cloning Gaussians—is not sufficiently constrained. It introduces a principled schedule: deferring all densification until a static scene representation has been established, and subsequently refining only regions consistently identified as static. Empirical analysis across diverse datasets demonstrates that this strategy substantially mitigates overfitting to transients and illumination artifacts, yielding more stable and accurate reconstructions (Fu et al., 4 Dec 2025).
1. Motivation and Problem Setting
Standard 3DGS pipelines enable densification early in training (e.g., after 500 iterations, then every 10,000 iterations). While critical for modeling fine geometric detail, premature densification during training on real-world data exacerbates two issues: (a) spawning new Gaussians tracks transient artifacts (moving objects, shadows, highlights), as the photometric loss is dominated by outliers before scene geometry stabilizes; (b) the model overfits these artifacts, producing "floater" artifacts and temporal/photometric instability. The delayed Gaussian growth schedule addresses this by ensuring that initial optimization exclusively recovers static scene structure and appearance, precluding the possibility of adapting model capacity to non-static effects early on.
2. Algorithmic Structure and Training Regime
Delayed Gaussian growth modifies the densification schedule by introducing an explicit growth delay parameter (e.g., iterations). No splitting or cloning of Gaussians is permitted until this threshold. Afterward, densification is triggered regularly with interval (e.g., every 10,000 iterations), conditioned on additional criteria:
- For each Gaussian , compute its accumulated position gradient over static-weighted rays:
where is the mask value denoting static region confidence.
- If (positional-gradient threshold, typically $0.1$–$0.2$), split or clone .
- Optionally prune Gaussians with low opacity post-splitting.
This entire schedule is encapsulated by the indicator function
ensuring densification only when .
3. Mathematical Formulation
In 3DGS, each Gaussian is parameterized by center , covariance , opacity , and spherical harmonics coefficients . Rendering employs ordered, compositional splatting across projected 2D Gaussians:
with a reconstruction loss
RobustSplat++ modifies the densification trigger mechanism to sum gradients only over rays with . The mask is predicted by an MLP over DINOv2 features, and heavy static regularization is applied to before so everywhere in early iterations.
4. Interaction with Static-Scene Optimization
For the first iterations, the model is restricted to the initial, sparse set of Gaussians recovered from SfM. During this phase:
- The network recovers low-frequency structure in geometry.
- Appearance embeddings (2D/3D) for illumination effects are learned in a smoothly varying, noise-reduced context.
- Mask-MLP is trained with strong regularization, encouraging masks to predict all regions as static.
Since densification is completely inhibited, no model capacity is allocated to transient or ill-posed regions. After static scene geometry is locked in, the densification process begins but only operates in spatial regions and directions strongly supported by multi-view consistency, as encoded in the learned mask.
5. Empirical Evaluations
Quantitative and qualitative analyses in (Fu et al., 4 Dec 2025) provide evidence for the efficacy of delayed Gaussian growth:
- PSNR and SSIM degrade rapidly for standard 3DGS when densification fits moving people and shadows (cf. Fig. 2), while models with delayed growth maintain higher, more stable metrics.
- Ablations varying (Fig. 4) show that later densification yields greater PSNR stability, both with and without mask supervision.
- On "NeRF On-the-go" benchmarks (Table III), adding delayed growth ("+DG") improves low-occlusion PSNR from and SSIM from . The full model (mask+bootstrapping+DG) materially advances results ($21.08$, $0.719$).
- On "NeRF-OSR" (Table VI), removing delayed growth causes marked PSNR drops (from ) and SSIM ().
- Qualitative results document clearer reconstructions of static structure, elimination of floaters, and suppression of transient artifacts.
6. Implementation Guidance and Hyperparameters
Key integration procedures and recommended values are as follows:
- Use existing 3DGS codebase (e.g., graphdeco's official).
- Set maximum iterations .
- Growth delay ; adjust to for noisy/large datasets, for clean ones.
- Positional threshold : default 3DGS, adjust as needed.
- Mask bootstrapping: low-res (224×224) DINOv2 until , high-res (504×504) afterward.
- Mask-MLP: two linear layers, Adam lr , mask loss weights , , warm-up .
- Appearance-MLP: three linear layers, Adam lr , 2D embedding $48$, 3D embedding $30$.
- DINOv2 ViT-S/14 provides mask-MLP input features.
Adopting delayed Gaussian growth requires minimal code changes—typically additional lines in the training loop—and is compatible with either static or appearance-augmented 3DGS pipelines.
7. Significance and Implications
The delayed Gaussian growth strategy provides a generalizable approach for improving robustness in 3DGS-derived scene representations subject to dynamic real-world phenomena. By structurally preventing early overfitting to transients and local photometric outliers, it enables higher-fidelity reconstructions and more accurate geometry, especially in unconstrained environments. This suggests a broader principle: for geometric MLP-based models where dynamic and illumination confounders are present, initial static-structure optimization followed by capacity expansion prevents deleterious overadaptation and improves final model utility (Fu et al., 4 Dec 2025).