01
Trajectory Collection
Generate 200 agentic reasoning trajectories across multi-step tasks with 10-100 action horizons. Each trajectory records token-level, tool-call, skill-selection, and memory operations with ground-truth credit annotations.
02
Hierarchical Decomposition
Decompose each trajectory into three levels: macro-level (skill selection), meso-level (tool calls and memory ops), and micro-level (token generation). Credit flows top-down through the hierarchy via hindsight analysis.
03
Hindsight Credit Propagation
After observing outcomes, propagate credit backwards through the hierarchy. Each level receives credit proportional to its counterfactual contribution, using temporal difference decomposition within levels.
04
Evaluation and Transfer
Evaluate credit accuracy via Pearson/Spearman correlation against ground truth. Test horizon robustness, action-type sensitivity, cross-task transfer, and computational scalability across all methods.