A framework for quantifying how variation in user communication styles -- formality, verbosity, politeness, dialect, cultural context, and domain expertise -- affects LLM agent task success.
Adjust the style dimensions to see how they affect predicted task success. The model uses exponential decay: P(correct) = alpha * exp(-beta * weighted_distance).
Spearman correlation (rho) between each style dimension and task success. Negative values indicate that higher values on that axis degrade performance.
We decompose the total entropy of task success H(S) into three components:
| I(S; Content) | 0.006--0.009 bits |
| I(S; Style | Content) | 0.015--0.072 bits |
| H(S | Content, Style) | 0.845--0.971 bits |
| Agent | Chi-squared | p-value | Sig. | Kruskal-Wallis H | p-value | Sig. |
|---|---|---|---|---|---|---|
| Low Sens. | 87.6 | 7.08e-19 | *** | 140.0 | 1.60e-24 | *** |
| Mod. Sens. | 271.7 | 1.36e-58 | *** | 374.8 | 1.37e-73 | *** |
| High Sens. | 462.6 | 6.15e-100 | *** | 566.4 | 2.18e-114 | *** |
| Dialect Vuln. | 457.3 | 8.48e-99 | *** | 931.2 | 1.21e-192 | *** |
6-dimensional parameterization grounded in sociolinguistic theory: formality, verbosity, politeness, dialect distance, cultural context, domain expertise.
12 canonical profiles spanning the style space, from baseline to AAVE, L2, high-context, elderly, teen, expert, and corporate communicators.
Exponential decay: P(correct|s) = alpha * exp(-beta * d_w(s)). Four agent configurations varying in sensitivity and axis weighting.
CDSI = 1 - mean(SR_non-standard) / SR_baseline. Scalar summary of agent robustness with per-axis and equity decomposition.
Mutual information analysis separating content (what) from style (how) contributions to task outcome uncertainty.
Chi-squared test (style vs. success) and Kruskal-Wallis H test (demographic group differences). All significant at p less than 10^-18.