Papers
Topics
Authors
Recent
Search
2000 character limit reached

AdamW-GS Optimizer for 3D Gaussian Splatting

Updated 26 January 2026
  • AdamW-GS is a specialized optimizer for 3D Gaussian Splatting that decouples regularization components to address sparse update patterns.
  • It integrates Sparse Adam, Re-State Regularization (RSR), and Decoupled Attribute Regularization (DAR) to enhance efficiency and improve reconstruction fidelity.
  • Empirical results show gains in PSNR and SSIM along with reduced training time, making it practical for real-time view synthesis and scene optimization.

AdamW-GS denotes an optimizer specifically constructed for 3D Gaussian Splatting (3DGS) to address the peculiarities and inefficiencies of using standard deep neural network optimizers in explicit scene representations. Unlike classical Adam or AdamW, AdamW-GS introduces a lightweight re-coupling of three decoupled modules—Sparse Adam, Re-State Regularization (RSR), and Decoupled Attribute Regularization (DAR)—to tailor optimization to the unique physics and sparse update patterns of 3DGS. AdamW-GS improves representation effectiveness, reduces model complexity, and accelerates training, outperforming conventional approaches in reconstruction fidelity and computational efficiency (Ding et al., 23 Jan 2026).

1. Optimization Framework in 3D Gaussian Splatting

3D Gaussian Splatting represents a scene using nn Gaussian primitives {Gi}i=1n\{\mathcal G_i\}_{i=1}^n parameterized by their 3D positions μiR3\mu_i\in\mathbb R^3, covariance ΣiS+3×3\Sigma_i\in\mathbb S_+^{3\times3} (often further decomposed into shape quaternion and scale vector), per-primitive opacity oi>0o_i>0 (usually via sigmoid activation σ(τi)\sigma(\tau_i)), and color coefficients ciRkc_i\in\mathbb R^k. For a given camera view, rendering is performed by alpha-blending projected Gaussians.

The optimization target comprises photometric losses: L(θ)=(1λ1)LL1(θ)+λ1LDSSIM(θ)\mathcal L(\theta) = (1-\lambda_1)\mathcal L_{L1}(\theta) + \lambda_1\mathcal L_{\mathrm{DSSIM}}(\theta) plus selective regularization: R(θ)=λoioi+λsisi1.\mathcal R(\theta) = \lambda_o\sum_i |o_i| + \lambda_s\sum_i \|s_i\|_1. In variants utilizing MCMC, additional noise is injected into positions for exploration.

2. Decoupled Optimization Components

AdamW-GS structurally splits the optimization routine into three interaction-minimizing ingredients:

A. Sparse Adam

Standard Adam synchronously updates all primitives each step, even when their gradient vanishes ((θi)=0\nabla\ell(\theta_i) = 0), resulting in unnecessary computation and parameter drift in non-visible scene regions. AdamW-GS introduces a visibility mask Vi{0,1}V_i\in\{0,1\} and modifies moment decay: βj=βjVi+(1Vi),j{1,2}\beta'_j = \beta_j V_i + (1-V_i), \quad j\in\{1,2\} which "freezes" state for invisible primitives (βj=1\beta'_j=1), preserving optimizer state only when active.

B. Re-State Regularization (RSR)

Sparse Adam forfeits Adam's beneficial "moment rescaling" (implicit moment resets when gradients disappear). RSR restores amplified regularization by periodically sampling primitives and attenuating moments: mtα1mt,vtα2vtm_t \leftarrow \alpha_1 m_t, \quad v_t \leftarrow \alpha_2 v_t with empirical scaling α10.2\alpha_1 \approx 0.2, α2α12=0.04\alpha_2 \approx \alpha_1^2 = 0.04. This process reactivates L1 regularization effects for dormant scene attributes.

C. Decoupled Attribute Regularization (DAR)

Adam and AdamW couple regularization gradients R\nabla\mathcal R with photometric loss \nabla\ell in moment accumulation, leading to effectively variable regularization strengths. AdamW-GS separates moment updates—driven strictly by photometric gradients—while DAR applies attribute-wise penalties scaled by the inverse RMS, plus clipping to bound regularization steps: Δθ=ηm^tv^t+εηmin(λθR(θ)/NIv^t+ε,C)\Delta \theta = -\eta \frac{\hat m'_t}{\sqrt{\hat v'_t}+\varepsilon} - \eta\, \min\left(\lambda_\theta \frac{\nabla\mathcal R(\theta)/N_I}{\sqrt{\hat v'_t} + \varepsilon}, C\right) where CC is a clipping constant (e.g., C=10C=10) and NIN_I is the number of visible primitives. Regularization is thus adaptively strong for low-photometric-gradient (invisible) regions and weak for highly active ones.

3. Full AdamW-GS Update Rule

For each attribute block (opacity, scale, color, position), the optimizer maintains decoupled moment statistics, applies optional RSR, and executes the update: θt+1=θtηm^tv^t+εηmin(λθR(θ)/NIv^t+ε,C)\theta_{t+1} = \theta_t - \eta\, \frac{\hat m'_t}{\sqrt{\hat v'_t} + \varepsilon} - \eta\, \min \left(\lambda_\theta\, \frac{\nabla \mathcal R(\theta)/N_I}{\sqrt{\hat v'_t} + \varepsilon},\, C \right) Special cases for opacity o=σ(τ)o = \sigma(\tau) and scale s=exp(κ)s = \exp(\kappa) incorporate their respective chain-rule L1 penalties.

4. Key Differences Relative to Adam and AdamW

Optimizer Update Pattern Regularization Sparsity and Adaptivity
Adam Synchronous (every primitive) L2 and L1 coupled No spatial sparsity
AdamW Synchronous Decoupled L2 No attribute-wise adaptivity
AdamW-GS Asynchronous (Sparse Adam) Fully decoupled DAR Visibility-driven, adaptive

AdamW-GS combines asynchronous updates, explicit regularization moment resets, and per-attribute, per-primitive regularization scaling, in contrast to the synchronous and globally constant approaches of Adam and AdamW.

5. Hyperparameters and Scene-Adaptive Settings

AdamW-GS inherits visibility-based updates from camera viewpoint testing and applies the following settings:

  • RSR: sample interval T=100T=100 iterations; sampling ratios adjusted for scene (lower for indoor, higher for outdoor); scaling α1=0.2\alpha_1=0.2, α2=0.04\alpha_2=0.04.
  • DAR regularization: λo=103\lambda_o=10^{-3} for opacity, λs=105\lambda_s=10^{-5} for scale (reflecting disparate activation scales), clipping C=10C=10; regularization applied after initial densification phase (≥3000 iters).
  • Base learning rates: η=102\eta=10^{-2} for position/color, ητ=5×102\eta_\tau=5\times 10^{-2} for opacity, ηκ=102\eta_\kappa=10^{-2} for scale, β1=0.9\beta_1=0.9, β2=0.999\beta_2=0.999, ε=108\varepsilon=10^{-8}.

6. Empirical Performance and Ablation Insights

AdamW-GS demonstrates quantifiable improvements and efficiency:

  • On Mip-NeRF360/3DGS-MCMC:
    • PSNR: 27.95dB28.22dB27.95\,\mathrm{dB} \rightarrow 28.22\,\mathrm{dB}
    • SSIM: 0.8330.8400.833 \rightarrow 0.840
    • LPIPS: 0.1990.1820.199 \rightarrow 0.182
    • Primitive reallocation: +4.5%+4.5\,\%
    • Training time: 15%-15\,\%
  • Vanilla 3DGS:
    • Prunes $48$–50%50\,\% active primitives (vs. 40%40\,\% for RePR, 53%53\,\% for MaskGaussian)
    • PSNR increase: +0.17dB+0.17\,\mathrm{dB}
    • SSIM increase: +0.007+0.007
    • Training time reduction: >40%>40\,\%

Ablation results reveal that Sparse Adam alone worsens PSNR; adding only RSR recovers quality but not pruning efficacy; naive L1 coupling leads to unstable regularization; the combination of DAR and RSR yields stable, scene-adaptive pruning and consistent gains. Pipelines such as MaskGaussian, Taming-3DGS, Deformable Beta Splatting all inherit AdamW-GS, exhibiting universally improved metrics and reduced complexity.

Significance: AdamW-GS, by unifying physical scene constraints with attribute-aware, sparse, and decoupled optimization, outperforms conventional practices in speed, model compactness, and rendering accuracy. Its drop-in nature has immediate utility for real-time view synthesis and potentially broader explicit representation optimization in computer vision and graphics (Ding et al., 23 Jan 2026).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to AdamW-GS.