Comparing strategies to prevent inflated LLM performance metrics from training data overlap
| Strategy | F1 (Approx.) | Effectiveness | Cost |
|---|---|---|---|
| No Mitigation | 0.008 | 0.0% | 1.0x |
| N-gram Dedup | 0.745 | 58.6% | 1.15x |
| Embedding Dedup | 0.918 | 83.5% | 1.45x |
| Dynamic Regen | 0.977 | 93.5% | 2.10x |
| Score Adjust | 0.900 | 80.5% | 1.05x |