Ascertain whether ARC-AGI overfitting is accidental or intentional

Determine whether the observed knowledge-dependent benchmark overfitting affecting ARC-AGI-1 and ARC-AGI-2 has arisen accidentally or intentionally, in order to clarify the source of apparent contamination and its implications for benchmark validity.

Background

The paper argues that AI reasoning systems can exhibit a form of knowledge-dependent overfitting on ARC-AGI, enabled by substantial exposure to related public data. This phenomenon can make even well-designed benchmarks susceptible to indirect contamination when public training and private test distributions are too similar.

The authors present evidence such as models implicitly using ARC-specific color mappings in reasoning despite no explicit mention in prompts, suggesting ARC-like data are well represented in model pretraining corpora. However, they explicitly state uncertainty about whether such overfitting/contamination is accidental or intentional.

References

We assert that this phenomenon is now occurring with ARC-AGI-1 and ARC-AGI-2 – accidentally or intentionally, although we cannot determine which.

ARC Prize 2025: Technical Report  (2601.10904 - Chollet et al., 15 Jan 2026) in Section: Knowledge Overfitting