Larger models extract disproportionately more from multilingual supervised fine-tuning
| Group | Slope (27B) | Slope (4B) | Ratio |
|---|---|---|---|
| High Resource | 0.00464 | 0.00112 | 4.15x |
| Mid Resource | 0.00632 | 0.00152 | 4.16x |
| Low Resource | 0.00624 | 0.00150 | 4.17x |
| Typologically Distant | 0.00626 | 0.00130 | 4.80x |