Label Concentration, Ranking Flattening, and Format-Aware Calibration. Investigating how alignment (instruction tuning and preference tuning) distorts categorical labels, pairwise preferences, and rankings in LLM-as-a-judge evaluations.
Sato et al. (2026) showed alignment causes numerical score concentration in LLM judges. This work extends the analysis to non-numeric formats (categorical labels, pairwise preferences, rankings) across three alignment stages: Base, Instruction-Tuned (IT), and IT + Preference-Tuned (IT+PT).
| Distribution | Stage | Entropy | Entropy Drop | JS Divergence | Accuracy |
|---|---|---|---|---|---|
| Uniform | Base | 2.321 | 0.000 | 0.0001 | 0.782 |
| IT | 2.313 | 0.008 | 0.0014 | 0.810 | |
| IT+PT | 2.263 | 0.058 | 0.0101 | 0.765 | |
| Realistic | Base | 2.180 | -0.164 | 0.0080 | 0.787 |
| IT | 2.073 | -0.057 | 0.0023 | 0.843 | |
| IT+PT | 1.987 | 0.029 | 0.0011 | 0.858 | |
| Bimodal | Base | 2.286 | -0.032 | 0.0010 | 0.768 |
| IT | 2.267 | -0.014 | 0.0015 | 0.793 | |
| IT+PT | 2.220 | 0.034 | 0.0053 | 0.803 |