Extrapolation Boundaries: Fitting vs Transfer

Fitting vs Transfer: Error Comparison

Log-scale prediction error as a function of extrapolation ratio.

Fitting predictions diverge from true loss beyond the boundary.

Fitting boundary at 32x: Power-law fitting maintains <5% error up to 32x extrapolation.
Transfer is more resilient: muTransfer degrades smoothly, with usable predictions to higher ratios.
Sharp vs smooth: Fitting exhibits a phase transition; Transfer degrades gradually.
Practical rule: Source experiments should be at least 1/32 of target scale for fitting.