On the Flexibility of Regularization Hyperparameters in 3D Gaussian Splatting Under Adaptive Optimizers

Interactive companion to the research paper

cs.CV — Computer Vision 3D Gaussian Splatting Adam Optimizer

Abstract

3DGS pipelines use scalar hyperparameters (λ) to weight regularization losses. Practitioners assume λ provides proportional control. We show that under the Adam optimizer this assumption fails: the per-parameter adaptive denominator absorbs gradient magnitude changes, creating sub-linear ERR response, cross-coupling between terms, and heterogeneous effective strength across parameter types.

Key finding: A 500× increase in λ yields only ~142× increase in effective regularization strength under Adam, vs. the full 500× under SGD. An adaptive λ-scheduler reduces ERR variance by 43.8%.

1. Background

The total 3DGS training loss is:

L = Lrecon + Σk λk Lreg(k)

Under SGD the parameter update is linear in λ, so doubling λ doubles the regularization influence. Under Adam the update is:

Δθ = −η · m̂ / (√v̂ + ε)

The second-moment estimate v̂ absorbs the total gradient magnitude, coupling all loss terms and distorting the λ-to-strength mapping.

2. Interactive: ERR vs. Lambda (SGD vs. Adam)

Adjust the gradient magnitude ratio and observe how ERR responds to λ under each optimizer.

1.0
0.5
0.999

3. Interactive: Adaptive Lambda Scheduling

Compare fixed λ vs. the adaptive controller that targets a specified ERR value.

0.100
0.20
0.12

4. Summary of Results

MetricValue
Log-log slope (Adam ERR vs. λ)0.853
Log-log slope (SGD reference)0.852
500× λ → ERR ratio142×
Cross-coupling ratio (off/on-diag)0.034
ERR heterogeneity (max/min across types)14.2×
Fixed λ ERR mean ± std0.166 ± 0.038
Adaptive λ ERR mean ± std0.224 ± 0.021
Adaptive variance reduction43.8%

5. Three Distortion Mechanisms

Mechanism 1: Denominator Absorption

Increasing λ inflates the regularization gradient, which inflates the second-moment estimate v̂, which inflates the Adam denominator, partially canceling the intended effect. This creates the sub-linear ERR-vs-λ response.

Mechanism 2: Shared Second Moments

All gradient components contribute to a single v̂ estimate. Changing one λ perturbs the second moments for all gradient components, coupling regularization terms. Cross-coupling ratio measured at 0.034.

Mechanism 3: Gradient Magnitude Heterogeneity

Different 3DGS parameter types (position, scale, opacity, color) have different gradient magnitude profiles. A single λ produces up to 14.2× different ERR values across types. Per-parameter-type λ or decoupled optimizers are needed.

6. Proposed Solutions

Adaptive λ-scheduling: Adjusts λ online via a negative-feedback controller: log(λt+1) = log(λt) − η · (ERRt − ERR*). Reduces ERR variance by 43.8%.

Decoupled optimization (Ding et al., 2026): Separates reconstruction and regularization into independent optimizer channels, eliminating the second-moment coupling entirely. Restores linear λ control.

Decoupled weight decay (AdamW): Applies regularization as a weight-decay step outside the Adam update, removing it from the gradient computation entirely.

References

[1] Kerbl et al. "3D Gaussian Splatting for Real-Time Radiance Field Rendering." ACM TOG 42(4), 2023.
[2] Ding et al. "A Step to Decouple Optimization in 3DGS." arXiv:2601.16736, 2026.
[3] Loshchilov & Hutter. "Decoupled Weight Decay Regularization." ICLR, 2019.
[4] Kingma & Ba. "Adam: A Method for Stochastic Optimization." ICLR, 2015.
[5] Pezeshki et al. "Gradient Starvation." NeurIPS, 2021.
[6] Chen et al. "GradNorm: Gradient Normalization for Adaptive Loss Balancing." ICML, 2018.
[7] Kendall et al. "Multi-task Learning Using Uncertainty to Weigh Losses." CVPR, 2018.
[8] Huang et al. "2D Gaussian Splatting for Geometrically Accurate Radiance Fields." SIGGRAPH, 2024.