Extrapolating OPSD Gains Beyond 8 Billion Parameters

A Multi-Model Scaling Analysis with Uncertainty Quantification for On-Policy Self-Distillation

Based on Zhao et al. (arXiv: 2601.18734, Jan 2026)

Predicted Gain at 70B

19.6 pp
+/- 11.3 pp (model-averaged)

Bootstrap 95% CI

[10.5, 32.6]
1,000 bootstrap resamples

Best Model

Power Law
Akaike weight: 0.338

Optimal Next Experiment

140B
Maximum model disagreement

Key Findings

Finding 1: All five candidate scaling laws agree that OPSD gains continue beyond 8B parameters, but the growth rate is highly uncertain (9.1 to 32.9 pp at 70B).
Finding 2: The distribution-match component dominates at large scale, growing as N0.95, while dark knowledge saturates around 11.5B parameters.
Finding 3: Model averaging provides the most robust extrapolation strategy across all synthetic ground-truth validation scenarios.
Finding 4: Information-theoretic analysis identifies 140B as the most informative model size for future experiments to discriminate between scaling regimes.

Interactive Scaling Law Explorer

OPSD Gain vs. Model Size

Model Selection Results

Akaike Weights

Predictions at 70B

ModelParameters (k)Chi-squaredAICBICWeightPred. at 70B (pp)
Power Law20.554.553.760.33832.9 ± 6.0
Saturating20.734.733.950.3099.1 ± 1.2
Sigmoid30.026.024.850.16215.7 ± 9.7
Sqrt-Log30.066.064.880.15917.2 ± 2.8
Logarithmic25.279.278.490.03212.2 ± 0.9
Model Averaged----1.00019.6 ± 11.3

Theoretical Gain Decomposition

The OPSD gain is decomposed into three mechanistic components: distribution match (on-policy advantage), dark knowledge transfer, and implicit regularization.

Component Contributions

Fitted Parameters

ComponentParameterValueInterpretation
Distribution Matchα0.232Scaling coefficient
β0.950Nearly linear growth
Dark Knowledgeγ0.100Max gain (pp)
Nchar11.5BSaturation scale
Regularizationδ3.080Max regularization benefit
η0.833Growth rate

Uncertainty Quantification

Bootstrap analysis (1,000 resamples) quantifies the combined uncertainty from data noise, model parameters, and model selection.

Bootstrap 95% Confidence Intervals at 70B

Optimal Experiment Design

Information-theoretic analysis identifies which model sizes would provide the most discriminating evidence between scaling regimes.

Information Value by Candidate Size