Critical Pre-Training Fraction to Prevent Catastrophic Forgetting

Phase transition framework identifying the minimum pre-training data fraction alpha_c that prevents catastrophic forgetting during fine-tuning.

0.947
Analytical alpha_c
0.55-0.83
NN alpha_c Range
5
Domain Divergences
11
Model Architectures
14
Mixing Fractions

Forgetting vs. Mixing Fraction (alpha)

Adaptation vs. Mixing Fraction (alpha)

Critical alpha vs. Domain Divergence (Analytical)

Critical alpha vs. Model Size

Neural Network Forgetting Across Domain Similarities

cos_simalpha=0.0alpha=0.2alpha=0.4alpha=0.6alpha=0.8alpha=1.0
0.90.0670.0390.0160.0000.0000.000
0.70.2360.1490.0800.0310.0000.000
0.50.4110.2650.1470.0630.0050.000
0.30.5900.3840.2160.0950.0140.000
0.10.7710.5040.2860.1290.0230.000

Model Size Scaling (cos_sim=0.5)

ArchitectureParamsalpha_cSharpness
[16]3530.54624.2
[32]7050.68743.3
[64]1,4090.78189.3
[128]2,8170.765137.3
[64,64]5,5690.781124.2
[128,128]19,3290.828238.9