Video Pointing Prompt Strategy Evaluation

0.599

Best Baseline F1 (GPT-5)

0.777

Molmo2-7B F1

+167%

Ablation Improvement

Hybrid

Best Strategy

0.442

Best Format F1

F1 Score: Model x Prompt Strategy

Component Ablation

Output Format Sensitivity

Model Performance Range

Strategy Comparison Table

Strategy	GPT-5	Gem-3	Gem-2.5	Qwen3	Molmo2
direct point	0.392	0.354	0.271	0.327	0.639
bounding box	0.429	0.396	0.313	0.368	0.673
cot spatial	0.523	0.491	0.405	0.465	0.731
structured json	0.476	0.441	0.354	0.414	0.704
frame indexed	0.447	0.412	0.329	0.384	0.685
hybrid anchor	0.599	0.565	0.480	0.539	0.777
temporal chain	0.504	0.470	0.384	0.444	0.722
multi scale	0.540	0.508	0.420	0.482	0.746