ARCTraj: A Dataset and Benchmark of Human Reasoning Trajectories for Abstract Problem Solving

Published 14 Nov 2025 in cs.AI | (2511.11079v2)

Abstract: We present ARCTraj, a dataset and methodological framework for modeling human reasoning through complex visual tasks in the Abstraction and Reasoning Corpus (ARC). While ARC has inspired extensive research on abstract reasoning, most existing approaches rely on static input--output supervision, which limits insight into how reasoning unfolds over time. ARCTraj addresses this gap by recording temporally ordered, object-level actions that capture how humans iteratively transform inputs into outputs, revealing intermediate reasoning steps that conventional datasets overlook. Collected via the O2ARC web interface, it contains around 10,000 trajectories annotated with task identifiers, timestamps, and success labels across 400 training tasks from the ARC-AGI-1 benchmark. It further defines a unified reasoning pipeline encompassing data collection, action abstraction, Markov decision process (MDP) formulation, and downstream learning, enabling integration with reinforcement learning, generative modeling, and sequence modeling methods such as PPO, World Models, GFlowNets, Diffusion agents, and Decision Transformers. Analyses of spatial selection, color attribution, and strategic convergence highlight the structure and diversity of human reasoning. Together, these contributions position ARCTraj as a structured and interpretable foundation for studying human-like reasoning, advancing explainability, alignment, and generalizable intelligence.

Abstract PDF Upgrade to Chat

Summary

The paper introduces the ARCTraj dataset that captures temporally ordered human reasoning trajectories and object manipulations in abstract tasks.
The paper presents a novel object-centric methodology with MDP transformation and action abstraction to enhance reinforcement learning and sequence modeling.
The paper reveals that structured human reasoning, through observable selection biases and convergent strategies, improves policy stability and model performance.

ARCTraj: A Dataset and Benchmark of Human Reasoning Trajectories for Abstract Problem Solving

Introduction

ARCTraj addresses a long-standing deficiency in research on the Abstraction and Reasoning Corpus (ARC): the lack of temporally-resolved, object-centric human reasoning data. While ARC has catalyzed work on systematizing abstract reasoning, prior efforts have largely depended on static input-output supervision, which fails to elucidate intermediates of human-like reasoning. ARCTraj systematically records temporally-ordered sequences of object-level manipulations as humans solve ARC-AGI-1 training tasks, thus enabling explicit modeling of intermediate cognitive processes.

The dataset comprises approximately 10,000 human trajectories over 400 ARC-AGI-1 tasks, each annotated with metadata (user, task, timestamps, and success labels). These trajectories reflect symbolic reasoning steps such as spatial selection, object manipulation, and compositional strategy execution—providing structured supervision channels absent from prior data. ARCTraj includes a methodological framework for downstream integration: action abstraction, MDP formalism, and preprocessing for reinforcement learning (RL), sequence modeling, and generative modeling.

Figure 1: Overview of the ARCTraj data collection process in which user interactions with the O2ARC platform are captured as temporally-ordered, semantically-rich object-level reasoning trajectories.

Dataset Structure and Object-centric Encoding

ARCTraj logs each problem-solving attempt as a temporally-ordered sequence of categorically-labeled actions. Each action triplet $\langle$ category, object, operation $\rangle$ is tied to a grid state and timestamp, aligning each reasoning step to a symbolic context (e.g., Selection, Coloring, Object-Oriented Move/Rotate/Flip). User-driven perceptual grouping ensures semantic consistency in object delineation, outperforming low-level pixel editing logs.

The dataset supports full replay, state comparison, and granular extraction for learning scenarios demanding Markovian state-action or symbolic sequence inputs.

Figure 2: An example ARCTraj trajectory, where each state–action entry logs category, operation, grid, and object details, facilitating structured state-space analysis.

Comparative Analysis with Prior Human ARC Datasets

Relative to H-ARC and ARC-Interactive-History, ARCTraj achieves stronger abstraction by encoding object-centric operations and explicitly segmenting selection actions. While H-ARC records merged pixel actions, ARCTraj’s structure reflects strategy formation, intention, and composition at the object level (37.7% object-related actions when selection is excluded versus H-ARC’s 0.9%).

Notably, ARCTraj demonstrates greater cross-trajectory convergence (43.7% shared grids vs. 11.4% in H-ARC)—implying that human solvers, when provided with semantically meaningful actions, converge on similar intermediate representations. Despite fewer participants, ARCTraj achieves deeper and denser trajectory coverage per task.

Extracted Cognitive Regularities

ARCTraj provides a basis for analyzing human attention biases, color source attribution, and shared intentions during abstract problem solving:

Selection Biases: Human attention clusters on local, compact, regular regions ( $1\times1$ to $3\times3$ ), with a pronounced preference for spatial and perceptual salience.

Figure 3: Distributional analyses reveal that humans preferentially select square- and bar-like regions of limited extent, emphasizing the dominance of local, salient reasoning foci.

Color Attribution: Outputs derive colors predominantly from the test input (66.5% of tasks), with the remainder mixing input and demonstration output colors. No task demands color transfer from demonstration input only—a constraint mirrored in human trajectories.
Shared Intentions: Across participants, key strategy templates involve short exploratory selection sequences followed by decisive transformation, with 90% of operations following one to four selection steps. Tasks with structured transformation objectives (e.g., compositional or selective cropping) exhibit low trajectory uniqueness, evidencing stable strategy convergence.
Figure 4: Human reasoning trajectories display substantial overlap on the majority of tasks, highlighting a convergence in intermediate cognitive strategies.

Unified Reasoning Pipeline for Downstream Learning

ARCTraj enables a generalized pipeline: task-to-MDP conversion, symbolic action abstraction, and downstream mapping for RL (e.g., ARCLE), generative models (e.g., LDCQ, GFlowNet), and sequence learners (e.g., Decision Transformers). Object-centric structural abstraction facilitates Markovian modeling, while intention and selection cues can be injected as auxiliary knowledge, driving explainable and sample-efficient learning models.

Figure 5: Demonstrates how ARCTraj preprocessing generates state-action sequences compatible with various downstream learning architectures.

Empirical Implications for ARC Solvers and General AI

ARCTraj enhances RL sample efficiency and policy stability via imitation learning and reward shaping, as evidenced by PPO and World Model approaches leveraging the data. Latent diffusion models (LDCQ) utilize the compactness and semantic structuring of ARCTraj-encoded actions, enabling robust trajectory generation. GFlowNets benefit from ARCTraj by balancing exploitation of high-quality trajectories and diverse exploration of the solution space. For Decision Transformers, intention and object annotations facilitate higher-level policy learning, improving generalization and trajectory coherence.

Strong empirical results are observed: Decision Transformers with intention cues obtain 6–8 percentage point accuracy gains, and GFlowNet models trained with selection priors yield increased diversity and coverage of structurally valid solutions.

Theoretical and Practical Directions

ARCTraj provides actionable priors for extrapolating systematic cognitive regularities into AI system design, such as attention scheduling, compositional abstraction, and intention alignment. Explicit modeling of selection and transformation regularities bridges System 2-inspired AI with human-inductive bias, addressing implementation gaps in program synthesis and abstract visual reasoning. The dataset’s object-centric MDP framework is extensible to other high-level reasoning domains (robotics, symbolic manipulation, and transformation learning).

Future research can formalize shared reasoning grammars, annotate intention clusters for few-shot analogical induction, and extend color/element lineage tracking. Predictive intention modeling and phase-based attention dynamics offer new avenues for adaptive curriculum design and collaborative human–AI agents.

Conclusion

ARCTraj sets a new standard for dynamic, interpretable datasets in symbolic reasoning and abstract problem solving. Its structured, temporally-ordered object-level action logs provide direct, actionable supervision for cognitive AI models and enable empirical investigation of the mechanistic basis of human generalization on ARC-like tasks (2511.11079).

Figure 6: Two representative ARC tasks: (left) size-based hollow square filling, and (right) object selection and cropping—tasks that exemplify the compositional reasoning ARCTraj trajectories enable.

The dataset is foundational for human-aligned abstract reasoning research, supporting both behavioral analysis and next-generation solver development. Its methodological pipeline—MDP transformation, action abstraction, and trajectory-based supervision—sets a precedent for data collection and modeling across high-level cognitive domains.

Markdown Report Issue