Reliable Disagreement Resolution in Multi-Agent Systems
Comparing aggregation mechanisms for evidence-backed consensus in multi-agent LLM systems
0.200
Best MAE (Calibrated Bayesian)
0.646
Best Error Amplification
4
Mechanisms Compared
500
Problems per Condition
MAE vs Agent Count
Error Amplification vs Correlation
MAE vs Evidence Quality
Summary Comparison
Detailed Results Table
Mechanism
Mean MAE
Best MAE
Mean Amplification
Worst Amplification