Scaling BAPO

Boundary-Aware Policy Optimization maintains persistent reliability advantages over baselines from 1.5B to 72B parameters on multi-hop QA benchmarks.

0.703
BAPO F1 @ 72B
0.997
F1 Scaling R-squared
0.0744
F1 Scaling Slope
0.2745
Lowest Calibration Error
+0.108
F1 Gap vs Best Baseline @ 72B

Scaling Curves

F1 Reliability vs Model Scale

F1 Reliability Gap (BAPO - Best Baseline)

Calibration Error by Method

F1 Scaling Law Parameters

Detailed Results

F1 Reliability Persistence Across Scales

ScaleBAPO F1Best Baseline F1Gap
1.5B0.57910.4869 (DAPO)+0.0922
3B0.60130.5043 (DAPO)+0.0970
7B0.63130.5373 (DAPO)+0.0941
14B0.65690.5511 (DAPO)+0.1058
32B0.67910.5857 (DAPO)+0.0933
72B0.70300.5951 (DAPO)+0.1079

Scaling Law Parameters

MethodAcc SlopeAcc R-sqF1 SlopeF1 R-sq
SFT0.06620.9200.04840.975
GRPO0.08480.9010.06660.977
PPO0.07160.9070.05490.928
DAPO0.10280.9750.06780.985
BAPO0.08980.9900.07440.997

Boundary Awareness Analysis

MethodIDK RateError RateCal. ErrorIDK-Err Corr.
SFT0.02380.51810.49430.0758
GRPO0.03790.44910.41130.1511
PPO0.03420.46850.43430.4888
DAPO0.05020.42790.37770.2149
BAPO0.12030.39480.27450.2344