Synthetic Curriculum & SnapPO for Low-Resource Languages

Evaluating the Solar methodology across 10 languages spanning five resource tiers, from English (5000T tokens) to Dzongkha (0.02T tokens).

+20.6
High-Resource Gain
+16.9
Mid-High Gain
+10.0
Low-Resource Gain
+10.0
Very-Low Gain
0.406
Min Transfer Ratio

Performance by Resource Tier

Full Pipeline Gain by Resource Tier

Component Contribution by Tier

Full Pipeline NLU Scores by Language

Transfer Ratio (vs Korean) - General NLU

Detailed Results

Full Pipeline Scores by Language

LanguageTierCorpus (T)NLUGen QualityReasoning
EnglishHigh500099.497.194.4
KoreanMid-High32082.377.576.4
TurkishMid8575.170.768.0
VietnameseMid7872.470.465.1
SwahiliLow4.553.750.947.4
YorubaLow1.245.142.639.4
QuechuaVery-Low0.1535.232.030.1
GuaraniVery-Low0.0830.628.726.2
BambaraVery-Low0.0426.125.821.1
DzongkhaVery-Low0.0222.721.717.6

Gains by Resource Tier

TierSynCurrSnapPOFull Pipeline
High+14.16+9.39+20.61
Mid-High+11.60+6.99+16.94
Mid+10.08+5.90+15.30
Low+6.79+5.06+10.04
Very-Low+7.11+4.66+10.01