Coupling Planning with Tool-Grounded Checks
Evaluating scoring functions and termination criteria for tool-integrated planning
0.993
Best Success Rate
+96.0pp
vs Baseline
p < 10
-6
ANOVA Significance
Configuration Comparison
Tool Reliability Impact
Scoring Functions
Termination Criteria