Evaluating the Solar methodology across 10 languages spanning five resource tiers, from English (5000T tokens) to Dzongkha (0.02T tokens).
| Language | Tier | Corpus (T) | NLU | Gen Quality | Reasoning |
|---|---|---|---|---|---|
| English | High | 5000 | 99.4 | 97.1 | 94.4 |
| Korean | Mid-High | 320 | 82.3 | 77.5 | 76.4 |
| Turkish | Mid | 85 | 75.1 | 70.7 | 68.0 |
| Vietnamese | Mid | 78 | 72.4 | 70.4 | 65.1 |
| Swahili | Low | 4.5 | 53.7 | 50.9 | 47.4 |
| Yoruba | Low | 1.2 | 45.1 | 42.6 | 39.4 |
| Quechua | Very-Low | 0.15 | 35.2 | 32.0 | 30.1 |
| Guarani | Very-Low | 0.08 | 30.6 | 28.7 | 26.2 |
| Bambara | Very-Low | 0.04 | 26.1 | 25.8 | 21.1 |
| Dzongkha | Very-Low | 0.02 | 22.7 | 21.7 | 17.6 |
| Tier | SynCurr | SnapPO | Full Pipeline |
|---|---|---|---|
| High | +14.16 | +9.39 | +20.61 |
| Mid-High | +11.60 | +6.99 | +16.94 |
| Mid | +10.08 | +5.90 | +15.30 |
| Low | +6.79 | +5.06 | +10.04 |
| Very-Low | +7.11 | +4.66 | +10.01 |