LLM Capability Translation to Human-Like Decisions

Investigating how reasoning and generative capabilities of LLMs translate when producing judgments and decisions intended to resemble human choices.

cs.AI Behavioral Economics Decision Science
0.50
Optimal Reasoning Level
0.065
Best JSD (lower = better)
80.9%
Peak Decision Consistency
0.605
Reasoning-JSD Correlation

Reasoning Depth vs. Alignment (JSD)

Non-monotonic inverted-U curve: alignment peaks at intermediate reasoning, degrades at extremes.

Generative Fluency vs. Alignment (JSD)

Weak positive relationship: higher fluency slightly increases divergence from human patterns.

Per-Task Alignment Profiles

Different decision tasks show varying sensitivity to reasoning depth.

Summary Statistics

Key numerical results from the capability sweep experiments.

MetricValue
Best Reasoning Level (r*)0.50
JSD at r*0.065
JSD at r=0.1 (low)0.147
JSD at r=1.0 (high)0.111
Peak Decision Consistency0.809
Reasoning-JSD Correlation0.605
Fluency-JSD Correlation0.512

Decision Consistency across Reasoning

Fraction of matching binary decisions between LLM and human baselines.

Key Findings

Main conclusions from the computational analysis.

  • Non-monotonic alignment: Reasoning and human-likeness follow an inverted-U curve.
  • Optimal intermediate reasoning: Best alignment at r=0.5, not at maximum capability.
  • Weak fluency effect: Generative quality barely impacts decision fidelity.
  • Task heterogeneity: Framing is most sensitive; anchoring is least sensitive.
  • Competing objectives: Behavioral fidelity and reasoning capability partially conflict.