Scaling Laws for Alignment Pretraining

Power-law relationships governing alignment loss as a function of model size, data volume, and compute

R²=0.951
Model Size Fit
R²=1.000
Data Scaling Fit
R²=1.000
Joint Fit
0.93%
E Recovery Error

Alignment Loss vs Model Size (fixed D=1M tokens)

Alignment Loss vs Data Tokens (fixed N=6.9B)

Compute-Optimal Frontier: Optimal Loss vs Compute

Fixed Mixture: Relative Improvement by Model Size

Parameter Recovery: Fitted vs True

ParameterTrueFittedRel. Error
E (irreducible)0.02140.02160.93%
A (model coeff)3.1741.83242.3%
α (model exp)0.35240.32637.4%
B (data coeff)2.4872.5331.9%
β (data exp)0.34010.34210.6%