A unified metric framework balancing Sparsity, Fidelity, and Mechanistic Completeness for evaluating interpretability decompositions via weighted harmonic mean.
| Sparsity | Sparsity Score | Fidelity Score | Completeness | SFC-Score |
|---|---|---|---|---|
| 0.50 | 0.500 | 0.995 | 0.980 | 0.746 |
| 0.60 | 0.600 | 0.990 | 0.960 | 0.803 |
| 0.70 | 0.700 | 0.980 | 0.940 | 0.848 |
| 0.80 | 0.800 | 0.960 | 0.910 | 0.884 |
| 0.85 | 0.850 | 0.940 | 0.930 | 0.905 |
| 0.90 | 0.900 | 0.900 | 0.870 | 0.890 |
| 0.95 | 0.950 | 0.820 | 0.780 | 0.847 |
| 0.99 | 0.990 | 0.650 | 0.600 | 0.727 |
| Profile | Weights (S:F:C) | Best Sparsity | SFC-Score |
|---|---|---|---|
| Equal | 1:1:1 | 0.85 | 0.905 |
| Sparsity-Heavy | 5:1:1 | 0.95 | 0.917 |
| Fidelity-Heavy | 1:5:1 | 0.70 | 0.911 |
| Completeness-Heavy | 1:1:5 | 0.80 | 0.892 |
| Config | Hidden Dim | Circuit Size | Pareto HV |
|---|---|---|---|
| Standard | 64 | 8 | 0.874 |
| Large | 128 | 16 | ~0.85 |
| Dense Circuit | 64 | 24 | ~0.82 |
| Sparse Circuit | 64 | 4 | ~0.90 |