Abstract
3DGS pipelines use scalar hyperparameters (λ) to weight regularization losses. Practitioners assume λ provides proportional control. We show that under the Adam optimizer this assumption fails: the per-parameter adaptive denominator absorbs gradient magnitude changes, creating sub-linear ERR response, cross-coupling between terms, and heterogeneous effective strength across parameter types.
1. Background
The total 3DGS training loss is:
Under SGD the parameter update is linear in λ, so doubling λ doubles the regularization influence. Under Adam the update is:
The second-moment estimate v̂ absorbs the total gradient magnitude, coupling all loss terms and distorting the λ-to-strength mapping.
2. Interactive: ERR vs. Lambda (SGD vs. Adam)
Adjust the gradient magnitude ratio and observe how ERR responds to λ under each optimizer.
3. Interactive: Adaptive Lambda Scheduling
Compare fixed λ vs. the adaptive controller that targets a specified ERR value.
4. Summary of Results
| Metric | Value |
|---|---|
| Log-log slope (Adam ERR vs. λ) | 0.853 |
| Log-log slope (SGD reference) | 0.852 |
| 500× λ → ERR ratio | 142× |
| Cross-coupling ratio (off/on-diag) | 0.034 |
| ERR heterogeneity (max/min across types) | 14.2× |
| Fixed λ ERR mean ± std | 0.166 ± 0.038 |
| Adaptive λ ERR mean ± std | 0.224 ± 0.021 |
| Adaptive variance reduction | 43.8% |
5. Three Distortion Mechanisms
Mechanism 1: Denominator Absorption
Increasing λ inflates the regularization gradient, which inflates the second-moment estimate v̂, which inflates the Adam denominator, partially canceling the intended effect. This creates the sub-linear ERR-vs-λ response.
Mechanism 2: Shared Second Moments
All gradient components contribute to a single v̂ estimate. Changing one λ perturbs the second moments for all gradient components, coupling regularization terms. Cross-coupling ratio measured at 0.034.
Mechanism 3: Gradient Magnitude Heterogeneity
Different 3DGS parameter types (position, scale, opacity, color) have different gradient magnitude profiles. A single λ produces up to 14.2× different ERR values across types. Per-parameter-type λ or decoupled optimizers are needed.
6. Proposed Solutions
Decoupled optimization (Ding et al., 2026): Separates reconstruction and regularization into independent optimizer channels, eliminating the second-moment coupling entirely. Restores linear λ control.
Decoupled weight decay (AdamW): Applies regularization as a weight-decay step outside the Adam update, removing it from the gradient computation entirely.
References
[1] Kerbl et al. "3D Gaussian Splatting for Real-Time Radiance Field Rendering." ACM TOG 42(4), 2023.
[2] Ding et al. "A Step to Decouple Optimization in 3DGS." arXiv:2601.16736, 2026.
[3] Loshchilov & Hutter. "Decoupled Weight Decay Regularization." ICLR, 2019.
[4] Kingma & Ba. "Adam: A Method for Stochastic Optimization." ICLR, 2015.
[5] Pezeshki et al. "Gradient Starvation." NeurIPS, 2021.
[6] Chen et al. "GradNorm: Gradient Normalization for Adaptive Loss Balancing." ICML, 2018.
[7] Kendall et al. "Multi-task Learning Using Uncertainty to Weigh Losses." CVPR, 2018.
[8] Huang et al. "2D Gaussian Splatting for Geometrically Accurate Radiance Fields." SIGGRAPH, 2024.