Quantifying Knowledge-Dependent Overfitting on ARC-AGI
Interactive decomposition of genuine ability vs. contamination effects
Overall Accuracy
42.5%
Genuine Ability
20.9%
Overfitting Fraction
50.9%
Performance Decomposition
Contamination Boost:
0.91
Novelty Gap Analysis
Multi-Model Comparison
ARC-AGI-1 vs ARC-AGI-2