Causal Role of Reasoning Bonds in Long CoT Learning
Why Imitation-Based Distillation Fails to Induce Chain-of-Thought Structure
0.652
Deep-Reasoning ACE
0.549
Self-Reflection ACE
0.449
Self-Exploration ACE
0.032
SFT Weight Error
Average Causal Effects
Learning Curves
Learned Weights vs True Causal Strength
Structural Similarity Index
Key Findings
All three bonds are causally significant: Deep-Reasoning (ACE=0.652), Self-Reflection (0.549), Self-Exploration (0.449).
SFT with authentic bonds recovers causal structure with only 0.032 weight error.
Imitation distillation fails due to 15-25% capture rate of deep bond structure, driving weights to saturation.
Random ICL distillation further corrupts signals, yielding the worst structural alignment (SSI=0.949).
The surface-marker vs. causal-structure distinction explains why copying bond keywords does not replicate reasoning ability.