AdamW-GS Optimizer for 3D Gaussian Splatting

Updated 26 January 2026

AdamW-GS is a specialized optimizer for 3D Gaussian Splatting that decouples regularization components to address sparse update patterns.
It integrates Sparse Adam, Re-State Regularization (RSR), and Decoupled Attribute Regularization (DAR) to enhance efficiency and improve reconstruction fidelity.
Empirical results show gains in PSNR and SSIM along with reduced training time, making it practical for real-time view synthesis and scene optimization.

AdamW-GS denotes an optimizer specifically constructed for 3D Gaussian Splatting (3DGS) to address the peculiarities and inefficiencies of using standard deep neural network optimizers in explicit scene representations. Unlike classical Adam or AdamW, AdamW-GS introduces a lightweight re-coupling of three decoupled modules—Sparse Adam, Re-State Regularization (RSR), and Decoupled Attribute Regularization (DAR)—to tailor optimization to the unique physics and sparse update patterns of 3DGS. AdamW-GS improves representation effectiveness, reduces model complexity, and accelerates training, outperforming conventional approaches in reconstruction fidelity and computational efficiency (Ding et al., 23 Jan 2026).

1. Optimization Framework in 3D Gaussian Splatting

3D Gaussian Splatting represents a scene using $n$ Gaussian primitives $\{\mathcal G_i\}_{i=1}^n$ parameterized by their 3D positions $\mu_i\in\mathbb R^3$ , covariance $\Sigma_i\in\mathbb S_+^{3\times3}$ (often further decomposed into shape quaternion and scale vector), per-primitive opacity $o_i>0$ (usually via sigmoid activation $\sigma(\tau_i)$ ), and color coefficients $c_i\in\mathbb R^k$ . For a given camera view, rendering is performed by alpha-blending projected Gaussians.

The optimization target comprises photometric losses: $\mathcal L(\theta) = (1-\lambda_1)\mathcal L_{L1}(\theta) + \lambda_1\mathcal L_{\mathrm{DSSIM}}(\theta)$ plus selective regularization: $\mathcal R(\theta) = \lambda_o\sum_i |o_i| + \lambda_s\sum_i \|s_i\|_1.$ In variants utilizing MCMC, additional noise is injected into positions for exploration.

2. Decoupled Optimization Components

AdamW-GS structurally splits the optimization routine into three interaction-minimizing ingredients:

A. Sparse Adam

Standard Adam synchronously updates all primitives each step, even when their gradient vanishes ( $\nabla\ell(\theta_i) = 0$ ), resulting in unnecessary computation and parameter drift in non-visible scene regions. AdamW-GS introduces a visibility mask $V_i\in\{0,1\}$ and modifies moment decay: $\beta'_j = \beta_j V_i + (1-V_i), \quad j\in\{1,2\}$ which "freezes" state for invisible primitives ( $\beta'_j=1$ ), preserving optimizer state only when active.

B. Re-State Regularization (RSR)

Sparse Adam forfeits Adam's beneficial "moment rescaling" (implicit moment resets when gradients disappear). RSR restores amplified regularization by periodically sampling primitives and attenuating moments: $m_t \leftarrow \alpha_1 m_t, \quad v_t \leftarrow \alpha_2 v_t$ with empirical scaling $\alpha_1 \approx 0.2$ , $\alpha_2 \approx \alpha_1^2 = 0.04$ . This process reactivates L1 regularization effects for dormant scene attributes.

C. Decoupled Attribute Regularization (DAR)

Adam and AdamW couple regularization gradients $\nabla\mathcal R$ with photometric loss $\nabla\ell$ in moment accumulation, leading to effectively variable regularization strengths. AdamW-GS separates moment updates—driven strictly by photometric gradients—while DAR applies attribute-wise penalties scaled by the inverse RMS, plus clipping to bound regularization steps: $\Delta \theta = -\eta \frac{\hat m'_t}{\sqrt{\hat v'_t}+\varepsilon} - \eta\, \min\left(\lambda_\theta \frac{\nabla\mathcal R(\theta)/N_I}{\sqrt{\hat v'_t} + \varepsilon}, C\right)$ where $C$ is a clipping constant (e.g., $C=10$ ) and $N_I$ is the number of visible primitives. Regularization is thus adaptively strong for low-photometric-gradient (invisible) regions and weak for highly active ones.

3. Full AdamW-GS Update Rule

For each attribute block (opacity, scale, color, position), the optimizer maintains decoupled moment statistics, applies optional RSR, and executes the update: $\theta_{t+1} = \theta_t - \eta\, \frac{\hat m'_t}{\sqrt{\hat v'_t} + \varepsilon} - \eta\, \min \left(\lambda_\theta\, \frac{\nabla \mathcal R(\theta)/N_I}{\sqrt{\hat v'_t} + \varepsilon},\, C \right)$ Special cases for opacity $o = \sigma(\tau)$ and scale $s = \exp(\kappa)$ incorporate their respective chain-rule L1 penalties.

4. Key Differences Relative to Adam and AdamW

Optimizer	Update Pattern	Regularization	Sparsity and Adaptivity
Adam	Synchronous (every primitive)	L2 and L1 coupled	No spatial sparsity
AdamW	Synchronous	Decoupled L2	No attribute-wise adaptivity
AdamW-GS	Asynchronous (Sparse Adam)	Fully decoupled DAR	Visibility-driven, adaptive

AdamW-GS combines asynchronous updates, explicit regularization moment resets, and per-attribute, per-primitive regularization scaling, in contrast to the synchronous and globally constant approaches of Adam and AdamW.

5. Hyperparameters and Scene-Adaptive Settings

AdamW-GS inherits visibility-based updates from camera viewpoint testing and applies the following settings:

RSR: sample interval $T=100$ iterations; sampling ratios adjusted for scene (lower for indoor, higher for outdoor); scaling $\alpha_1=0.2$ , $\alpha_2=0.04$ .
DAR regularization: $\lambda_o=10^{-3}$ for opacity, $\lambda_s=10^{-5}$ for scale (reflecting disparate activation scales), clipping $C=10$ ; regularization applied after initial densification phase (≥3000 iters).
Base learning rates: $\eta=10^{-2}$ for position/color, $\eta_\tau=5\times 10^{-2}$ for opacity, $\eta_\kappa=10^{-2}$ for scale, $\beta_1=0.9$ , $\beta_2=0.999$ , $\varepsilon=10^{-8}$ .

6. Empirical Performance and Ablation Insights

AdamW-GS demonstrates quantifiable improvements and efficiency:

On Mip-NeRF360/3DGS-MCMC:
- PSNR: $27.95\,\mathrm{dB} \rightarrow 28.22\,\mathrm{dB}$
- SSIM: $0.833 \rightarrow 0.840$
- LPIPS: $0.199 \rightarrow 0.182$
- Primitive reallocation: $+4.5\,\%$
- Training time: $-15\,\%$
Vanilla 3DGS:
- Prunes $48$– $50\,\%$ active primitives (vs. $40\,\%$ for RePR, $53\,\%$ for MaskGaussian)
- PSNR increase: $+0.17\,\mathrm{dB}$
- SSIM increase: $+0.007$
- Training time reduction: $>40\,\%$

Ablation results reveal that Sparse Adam alone worsens PSNR; adding only RSR recovers quality but not pruning efficacy; naive L1 coupling leads to unstable regularization; the combination of DAR and RSR yields stable, scene-adaptive pruning and consistent gains. Pipelines such as MaskGaussian, Taming-3DGS, Deformable Beta Splatting all inherit AdamW-GS, exhibiting universally improved metrics and reduced complexity.

Significance: AdamW-GS, by unifying physical scene constraints with attribute-aware, sparse, and decoupled optimization, outperforms conventional practices in speed, model compactness, and rendering accuracy. Its drop-in nature has immediate utility for real-time view synthesis and potentially broader explicit representation optimization in computer vision and graphics (Ding et al., 23 Jan 2026).

Markdown Report Issue Upgrade to Chat

References (1)

A Step to Decouple Optimization in 3DGS (2026)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to AdamW-GS.

AdamW-GS Optimizer for 3D Gaussian Splatting

1. Optimization Framework in 3D Gaussian Splatting

2. Decoupled Optimization Components

3. Full AdamW-GS Update Rule

4. Key Differences Relative to Adam and AdamW

5. Hyperparameters and Scene-Adaptive Settings

6. Empirical Performance and Ablation Insights

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Don't miss out on important new AI/ML research

AdamW-GS Optimizer for 3D Gaussian Splatting

1. Optimization Framework in 3D Gaussian Splatting

2. Decoupled Optimization Components

3. Full AdamW-GS Update Rule

4. Key Differences Relative to Adam and AdamW

5. Hyperparameters and Scene-Adaptive Settings

6. Empirical Performance and Ablation Insights

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Related Topics

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research