Single-Agent Long-Horizon Problem Solving

Visual-Grounding Decomposition (VGD) for extended multi-step tasks

47.8%
VGD Success (H=3)
6
Strategies Compared
500
Tasks per Horizon
3-20
Horizon Range

Success Rate vs Task Horizon

Average Compute Cost vs Horizon

Success Rate at Horizon = 3

Strategy Capabilities

Full Results Table

StrategyH=3H=5H=8H=12H=16H=20
Flat22.2%6.8%1.2%0.0%0.0%0.0%
Fixed Decomp.4.0%0.8%0.0%0.0%0.0%0.0%
Adaptive Decomp.3.4%0.2%0.0%0.0%0.0%0.0%
Verify & Backtrack18.0%5.4%0.6%0.0%0.0%0.0%
Curriculum-Guided21.4%9.6%1.8%0.4%0.0%0.0%
VGD (ours)47.8%28.4%12.6%3.6%1.2%0.4%