Minimal Two-Regime Training Model

L(W, gamma) = |gamma * (Wx) - y|^2 | SNR_W ~ sqrt(B)/d^2 | SNR_gamma ~ sqrt(B)/d

>5x

SNR Gap (gamma/W)

d^2 vs d

Noise Scaling

Required

Weight Decay

Linear

Gap Growth in d

Matrix W reaches noise-WD equilibrium; scalar gamma tracks signal freely
SNR gap grows with dimension d due to d^2 vs d gradient noise scaling
Larger batch sizes reduce the regime separation by increasing both SNRs
Weight decay is necessary for the two-regime behavior; without it both parameters grow freely

Minimal Model for Two Training Regimes