AdamW-GS Optimizer for 3D Gaussian Splatting
- AdamW-GS is a specialized optimizer for 3D Gaussian Splatting that decouples regularization components to address sparse update patterns.
- It integrates Sparse Adam, Re-State Regularization (RSR), and Decoupled Attribute Regularization (DAR) to enhance efficiency and improve reconstruction fidelity.
- Empirical results show gains in PSNR and SSIM along with reduced training time, making it practical for real-time view synthesis and scene optimization.
AdamW-GS denotes an optimizer specifically constructed for 3D Gaussian Splatting (3DGS) to address the peculiarities and inefficiencies of using standard deep neural network optimizers in explicit scene representations. Unlike classical Adam or AdamW, AdamW-GS introduces a lightweight re-coupling of three decoupled modules—Sparse Adam, Re-State Regularization (RSR), and Decoupled Attribute Regularization (DAR)—to tailor optimization to the unique physics and sparse update patterns of 3DGS. AdamW-GS improves representation effectiveness, reduces model complexity, and accelerates training, outperforming conventional approaches in reconstruction fidelity and computational efficiency (Ding et al., 23 Jan 2026).
1. Optimization Framework in 3D Gaussian Splatting
3D Gaussian Splatting represents a scene using Gaussian primitives parameterized by their 3D positions , covariance (often further decomposed into shape quaternion and scale vector), per-primitive opacity (usually via sigmoid activation ), and color coefficients . For a given camera view, rendering is performed by alpha-blending projected Gaussians.
The optimization target comprises photometric losses: plus selective regularization: In variants utilizing MCMC, additional noise is injected into positions for exploration.
2. Decoupled Optimization Components
AdamW-GS structurally splits the optimization routine into three interaction-minimizing ingredients:
A. Sparse Adam
Standard Adam synchronously updates all primitives each step, even when their gradient vanishes (), resulting in unnecessary computation and parameter drift in non-visible scene regions. AdamW-GS introduces a visibility mask and modifies moment decay: which "freezes" state for invisible primitives (), preserving optimizer state only when active.
B. Re-State Regularization (RSR)
Sparse Adam forfeits Adam's beneficial "moment rescaling" (implicit moment resets when gradients disappear). RSR restores amplified regularization by periodically sampling primitives and attenuating moments: with empirical scaling , . This process reactivates L1 regularization effects for dormant scene attributes.
C. Decoupled Attribute Regularization (DAR)
Adam and AdamW couple regularization gradients with photometric loss in moment accumulation, leading to effectively variable regularization strengths. AdamW-GS separates moment updates—driven strictly by photometric gradients—while DAR applies attribute-wise penalties scaled by the inverse RMS, plus clipping to bound regularization steps: where is a clipping constant (e.g., ) and is the number of visible primitives. Regularization is thus adaptively strong for low-photometric-gradient (invisible) regions and weak for highly active ones.
3. Full AdamW-GS Update Rule
For each attribute block (opacity, scale, color, position), the optimizer maintains decoupled moment statistics, applies optional RSR, and executes the update: Special cases for opacity and scale incorporate their respective chain-rule L1 penalties.
4. Key Differences Relative to Adam and AdamW
| Optimizer | Update Pattern | Regularization | Sparsity and Adaptivity |
|---|---|---|---|
| Adam | Synchronous (every primitive) | L2 and L1 coupled | No spatial sparsity |
| AdamW | Synchronous | Decoupled L2 | No attribute-wise adaptivity |
| AdamW-GS | Asynchronous (Sparse Adam) | Fully decoupled DAR | Visibility-driven, adaptive |
AdamW-GS combines asynchronous updates, explicit regularization moment resets, and per-attribute, per-primitive regularization scaling, in contrast to the synchronous and globally constant approaches of Adam and AdamW.
5. Hyperparameters and Scene-Adaptive Settings
AdamW-GS inherits visibility-based updates from camera viewpoint testing and applies the following settings:
- RSR: sample interval iterations; sampling ratios adjusted for scene (lower for indoor, higher for outdoor); scaling , .
- DAR regularization: for opacity, for scale (reflecting disparate activation scales), clipping ; regularization applied after initial densification phase (≥3000 iters).
- Base learning rates: for position/color, for opacity, for scale, , , .
6. Empirical Performance and Ablation Insights
AdamW-GS demonstrates quantifiable improvements and efficiency:
- On Mip-NeRF360/3DGS-MCMC:
- PSNR:
- SSIM:
- LPIPS:
- Primitive reallocation:
- Training time:
- Vanilla 3DGS:
- Prunes $48$– active primitives (vs. for RePR, for MaskGaussian)
- PSNR increase:
- SSIM increase:
- Training time reduction:
Ablation results reveal that Sparse Adam alone worsens PSNR; adding only RSR recovers quality but not pruning efficacy; naive L1 coupling leads to unstable regularization; the combination of DAR and RSR yields stable, scene-adaptive pruning and consistent gains. Pipelines such as MaskGaussian, Taming-3DGS, Deformable Beta Splatting all inherit AdamW-GS, exhibiting universally improved metrics and reduced complexity.
Significance: AdamW-GS, by unifying physical scene constraints with attribute-aware, sparse, and decoupled optimization, outperforms conventional practices in speed, model compactness, and rendering accuracy. Its drop-in nature has immediate utility for real-time view synthesis and potentially broader explicit representation optimization in computer vision and graphics (Ding et al., 23 Jan 2026).