Identifying the maximum scale at which learning-rate predictions remain accurate for large-scale pre-training.
cs.AI Scaling Laws muTransferLog-scale prediction error as a function of extrapolation ratio.
Fitting predictions diverge from true loss beyond the boundary.
| Metric | Fitting | Transfer |
|---|---|---|
| Boundary Ratio | 32x | 128x |
| Boundary Scale | 8B params | 32B params |
| Error at 2x | 0.3% | 0.1% |
| Error at 32x | 4.8% | 1.2% |
| Error at 128x | 16.1% | 3.8% |
| Failure Mode | Phase transition | Gradual drift |