Reliable Disagreement Resolution in Multi-Agent Systems

Comparing aggregation mechanisms for evidence-backed consensus in multi-agent LLM systems

0.200
Best MAE (Calibrated Bayesian)
0.646
Best Error Amplification
4
Mechanisms Compared
500
Problems per Condition

MAE vs Agent Count

Error Amplification vs Correlation

MAE vs Evidence Quality

Summary Comparison

Detailed Results Table

Mechanism Mean MAE Best MAE Mean Amplification Worst Amplification