GSNR predicts whether parameters can adapt scale under weight decay
High GSNR: gradient signal dominates noise, enabling scale adaptation. Low GSNR: noise dominates, parameter trapped in WD equilibrium.