Minimal Model for Two Training Regimes

Noise-dominated (matrix) vs. signal-dominated (scalar) parameter dynamics in LM training

L(W, gamma) = |gamma * (Wx) - y|^2    |    SNR_W ~ sqrt(B)/d^2    |    SNR_gamma ~ sqrt(B)/d
>5x
SNR Gap (gamma/W)
d^2 vs d
Noise Scaling
Required
Weight Decay
Linear
Gap Growth in d

Key Findings

Parameter Norms During Training

Signal-to-Noise Ratios

Batch Size Effect on SNR

Dimension Scaling