Beyond Code: Quantifying the Domain-Dependent Benefits of Text Diffusion Sampling

Investigating whether text diffusion advantages extend beyond code to mathematical reasoning, structured text, general prose, and translation through bidirectionality analysis, augmentation estimation, and simulated decoding comparison.

Problem Statement

Stable-DiffCoder demonstrated that diffusion-based LLMs outperform autoregressive baselines on code generation. The open question: do these benefits extend to other domains? This work answers affirmatively through three complementary analyses across five domains.

Translation: +10.1% accuracy gain
General Text: +7.5% accuracy gain
4/5 domains favor diffusion at 50% masking
Oracle gap: +1.4% to +8.8% at k=8

Bidirectionality Index by Domain

Bidirectionality Index (beta)

Domain Structural Properties

Values near 1.0 indicate symmetric forward/backward dependencies. General text and translation show perfect symmetry (beta=1.0). Code and math show slight forward dominance. Structured text has strongest asymmetry (beta=0.926).

Diffusion Augmentation Factor

Constraint Density by Domain

Effective Augmentation Multiplier (log scale)

DomainMean LengthUnique TokensConstraint DensityEff. Multiplier
Code24.31240.104177,169x
Math Reasoning18.11620.0865,156x
Structured Text14.41600.089562x
General Text14.41950.010487x
Translation11.91530.03499x

Decoding Accuracy: Diffusion vs. Autoregressive

Accuracy Comparison

Accuracy Gap (Diffusion - AR)

Accuracy Gap Across Mask Fractions

Diffusion Advantage by Mask Fraction

Sample Diversity & Oracle Accuracy (k=8, 50% mask)

Pairwise Diversity

Best-of-k Oracle Accuracy

Denoising Steps Sensitivity (50% mask)

Diffusion Accuracy vs. Denoising Steps

Key Findings

  • Diffusion benefits extend beyond code -- 4/5 domains show gains at 50% masking.
  • Largest gains in translation (+10.1%) and general text (+7.5%), where local context is weakest.
  • Diversity advantage: 25-33% more diverse samples across all domains.
  • Oracle gap of +1.4% to +8.8% at k=8 favors diffusion universally.
  • Moderate denoising steps (5-8) suffice -- diminishing returns beyond that.
  • Code benefits most from additional steps (+5.2% relative improvement).
  • Pearson r=0.530 between bidirectionality and accuracy gap.
  • Composite ranking: General Text > Translation > Code > Math > Structured Text.