Calibrated Stop/Continue Criteria Under Distribution Shift
Comparing stopping strategies across retrievers, corpora, and LLM backbones
Best ECE
0.103 (Bayesian)
Best Accuracy
0.481 (Fixed-5)
Configurations
36
ECE Comparison Across Criteria
Calibration Diagram
Criterion:
Bayesian Uncertainty
Confidence 0.7
Noise Sensitivity: ECE
Accuracy by Hop Depth