Discovering uncaptured failure modes in VLM agents operating without textual feedback
When textual feedback is removed from VLM agents, overall success drops (52.7% to 28.3%), yet named failure categories like action looping and state mismanagement DECREASE. This paradox implies the existence of failure modes not captured by existing taxonomies. DTA resolves this by identifying 3 novel failure modes that account for 49.8% of no-feedback failures.
Failure mode rates across feedback availability levels (1.0 = full feedback to 0.0 = none). Known modes decrease monotonically while novel/residual failures increase, confirming mode replacement rather than improvement.
| Metric | Value |
|---|---|
| F- failure episodes | 430 |
| Residual (classifier) | 133 (30.9%) |
| Anomalies (Isolation Forest) | 109 (25.3%) |
| Union (residual + anomaly) | 200 (46.5%) |
| Ground-truth novel failures | 214 |
| Precision | 1.000 |
| Recall | 0.621 |
| F1 Score | 0.767 |
Low action entropy (0.461), moderate repetition (0.346), strongly negative reward (-5.15). Agent takes confident actions from a restricted set as if receiving progress signals, but achieves poor outcomes. F+ rate: 4.4%, F- rate: 15.2%.
High action entropy (0.935), high observation diversity (0.527), near-zero reward (-0.07). Agent explores broadly without converging on any plan. Without feedback to confirm progress, exploration never terminates. F+ rate: 1.5%, F- rate: 21.8%.
Moderate entropy (0.694), short episodes (0.129), moderate negative reward (-1.52). Agent abandons multi-step planning entirely, reacting to each observation independently. F+ rate: 1.1%, F- rate: 12.3%.
| Failure Mode | F+ Rate | F- Rate | Change | Direction |
|---|---|---|---|---|
| Action Looping | 0.275 | 0.100 | -0.175 | DECREASE |
| State Mismanagement | 0.292 | 0.065 | -0.227 | DECREASE |
| Early Termination | 0.211 | 0.188 | -0.023 | decrease |
| Visual/Spatial Failure | 0.162 | 0.149 | -0.013 | decrease |
| Novel / Uncaptured | 0.060 | 0.498 | +0.438 | INCREASE |