Empirical Indicator of Gradient Noise for Scale Adaptation

GSNR predicts whether parameters can adapt scale under weight decay

Gradient Signal-to-Noise Ratio (GSNR)

GSNR = |E[g]|^2 / E[|g - E[g]|^2]

High GSNR: gradient signal dominates noise, enabling scale adaptation. Low GSNR: noise dominates, parameter trapped in WD equilibrium.

Interactive Configuration

0.20
0.010
Matrix GSNR:
Vector GSNR:
Scalar GSNR:

GSNR by Parameter Shape

Scale Adaptation vs Noise

GSNR Threshold Classification