Coupling Planning with Tool-Grounded Checks

Evaluating scoring functions and termination criteria for tool-integrated planning

0.993
Best Success Rate
+96.0pp
vs Baseline
p < 10-6
ANOVA Significance

Configuration Comparison

Tool Reliability Impact

Scoring Functions

Termination Criteria