Quantifying Knowledge-Dependent Overfitting on ARC-AGI

Interactive decomposition of genuine ability vs. contamination effects

Overall Accuracy
42.5%
Genuine Ability
20.9%
Overfitting Fraction
50.9%

Performance Decomposition

Novelty Gap Analysis

Multi-Model Comparison

ARC-AGI-1 vs ARC-AGI-2