CharToM-QA: Context Length vs ToM Difficulty

Factorial analysis disentangling long-context processing from theory-of-mind reasoning demands.

cs.AI Theory of Mind Benchmarks
74.9%
ToM Variance
19.4%
Context Variance
1.0%
Interaction
ToM
Dominant Factor

Variance Decomposition

Theory-of-mind order explains nearly 4x more variance than context length.

Context x ToM Interaction

Higher-order ToM questions show steeper context-length degradation.

Main Effects

Marginal accuracy by context length (left) and ToM order (right).

Results Summary

Factor% VarianceStd Dev
ToM Order74.9%1.5%
Context Length19.4%0.4%
Interaction1.0%1.1%
Residual4.7%

Key Findings

  • ToM is primary: 75% of difficulty comes from ToM reasoning, not context length.
  • Context is secondary: 19% context contribution is significant but not dominant.
  • Interaction exists: Longer contexts amplify 2nd-order ToM difficulty specifically.
  • Robust finding: Consistent across 5 model capability levels.