Visual-Grounding Decomposition (VGD) for extended multi-step tasks
| Strategy | H=3 | H=5 | H=8 | H=12 | H=16 | H=20 |
|---|---|---|---|---|---|---|
| Flat | 22.2% | 6.8% | 1.2% | 0.0% | 0.0% | 0.0% |
| Fixed Decomp. | 4.0% | 0.8% | 0.0% | 0.0% | 0.0% | 0.0% |
| Adaptive Decomp. | 3.4% | 0.2% | 0.0% | 0.0% | 0.0% | 0.0% |
| Verify & Backtrack | 18.0% | 5.4% | 0.6% | 0.0% | 0.0% | 0.0% |
| Curriculum-Guided | 21.4% | 9.6% | 1.8% | 0.4% | 0.0% | 0.0% |
| VGD (ours) | 47.8% | 28.4% | 12.6% | 3.6% | 1.2% | 0.4% |