Principled Mitigation of Spurious Linguistic Artifacts in SDFT

Explore how counterfactual token weighting compares to heuristic masking for preventing student models from inheriting teacher-conditioned artifacts.

3.0
1.0
3
20
Naive Artifact Rate
Mask-k Artifact Rate
CF Weighting Artifact Rate
CF Task Performance

Artifact Adoption Rate by Method

Task Performance by Method

Position-Specific Artifact Probability

Artifact-Performance Tradeoff (Pareto)