ARC-NCA: Towards Developmental Solutions to the Abstraction and Reasoning Corpus

Published 13 May 2025 in cs.AI and cs.NE | (2505.08778v1)

Abstract: The Abstraction and Reasoning Corpus (ARC), later renamed ARC-AGI, poses a fundamental challenge in artificial general intelligence (AGI), requiring solutions that exhibit robust abstraction and reasoning capabilities across diverse tasks, while only few (with median count of three) correct examples are presented. While ARC-AGI remains very challenging for artificial intelligence systems, it is rather easy for humans. This paper introduces ARC-NCA, a developmental approach leveraging standard Neural Cellular Automata (NCA) and NCA enhanced with hidden memories (EngramNCA) to tackle the ARC-AGI benchmark. NCAs are employed for their inherent ability to simulate complex dynamics and emergent patterns, mimicking developmental processes observed in biological systems. Developmental solutions may offer a promising avenue for enhancing AI's problem-solving capabilities beyond mere training data extrapolation. ARC-NCA demonstrates how integrating developmental principles into computational models can foster adaptive reasoning and abstraction. We show that our ARC-NCA proof-of-concept results may be comparable to, and sometimes surpass, that of ChatGPT 4.5, at a fraction of the cost.

Abstract PDF Upgrade to Chat

Summary

The paper demonstrates a developmental approach using standard NCA and EngramNCA with memory mechanisms to tackle abstraction and reasoning in ARC tasks.
It introduces modular architectures, GeneCA and GenePropCA, for encoding genetic primitives and propagating critical information across spatial lattices.
Experimental results reveal that ARC-NCA models can match or exceed LLM performance on ARC puzzles while significantly reducing computational cost.

Developmental Neural Cellular Automata for ARC Reasoning: A Technical Essay on ARC-NCA

Introduction

ARC-NCA presents a distinctly developmental approach to addressing the Abstraction and Reasoning Corpus (ARC), a benchmark designed to evaluate artificial general intelligence with stringent requirements for abstraction, generalization, and reasoning under few-shot supervision. This framework explores the capabilities of standard Neural Cellular Automata (NCA) and an enhanced variant, EngramNCA, which incorporates hidden memory mechanisms, for solving ARC tasks. The central motivation is to harness the emergent, self-organizing dynamics of NCAs to emulate cognitive processes observed during biological development, thereby advancing program synthesis and reasoning in computational models.

Figure 1: Example ARC task, demonstrating the complexity and variety of visual reasoning required.

Neural Cellular Automata Architectures

The ARC-NCA method integrates two major NCA classes:

Standard NCA: Employs differentiable, local update rules via neural networks, typically using convolutional layers to evolve cellular states across a spatial domain.
EngramNCA: Introduces dual-state architecture with public (interaction-based) and private (memory-based) states, enabling more complex behavior such as information retention, propagation, and selective activation.
Figure 2: Growing NCA update pass displaying local information flow and emergent spatial representation.

EngramNCA is structured as an ensemble comprising GeneCA and GenePropCA:

GeneCA: Responsible for encoding genetic primitives and driving morphological growth (Figure 3).
GenePropCA: Manages activation and propagation of genetic information across the spatial lattice, supporting effective memory transfer and regulation (Figure 4).
Figure 3: EngramNCA GeneCA diagram highlighting genetic primitive encoding.

Figure 4: EngramNCA GenePropCA diagram illustrating genetic information propagation and activation.

Multiple augmented variants of EngramNCA were evaluated (e.g., v2-v4), introducing mechanisms such as learnable sensing filters (replacing biological-inspired Sobel and Laplacian filters), attention-driven local-global processing, and toroidal/non-toroidal lattice behavior to accommodate ARC’s task variability.

Methodology and Data Representation

ARC tasks (visual transformation problems using grids of color-coded integers) are mapped onto NCAs by converting integer grids into real-valued RGB- $\alpha$ lattices via HSL-based color quantization, with an extension for binary channel encoding. This preprocessing enables seamless integration with NCA frameworks, which operate inherently on image-like, continuously valued tensors.

A key challenge is the variable input/output grid size within ARC. The solution employs either maximal grid padding (up to 30x30, with unique padding tokens) or grid exclusion for non-conforming examples, maintaining compatibility with NCA constraints.

Figure 5: Backpropagation step in EngramNCA training, demonstrating gradient flow through GeneCA and GenePropCA for a single ARC task.

Each ARC problem is treated as a distinct training episode, with individual NCAs initialized and fine-tuned per problem ("test-time training") using available examples, mirroring the program synthesis paradigm.

Experimental Results

Across several ARC-NCA variants and unions thereof, quantitative metrics included pixel-wise log-loss and direct problem solve rate. The best models (notably EngramNCA v3) achieved solve rates up to 12.9% (strict criteria), and union strategies boosted this further to 17.6%. Crucially, these rates are comparable to those of state-of-the-art LLMs (ChatGPT 4.5 ~10.3%), but with over three orders of magnitude lower computational cost per task.

Qualitative analysis revealed NCAs’ capacity for incremental, interpretable developments over the solution lattice, consistent with developmental growth strategies, yielding both exact and "almost solved" outputs.

Figure 6: Example ARC solution generated by the standard NCA, demonstrating successful generalization to novel spatial locations.

Figure 7: Example solution from EngramNCA v1, displaying intrinsic boundary-aware color filling and region differentiation.

Figure 8: Example solution from EngramNCA v3, illustrating line growth and adaptive error correction at boundaries.

Figure 9: EngramNCA v4 solution, showing robust diagonal and horizontal line formation across grid sizes.

When loss thresholds are relaxed (to allow near-perfect solutions), solve rates increase: individual models approach 16–17%, while unions reach 24%. Furthermore, increasing hidden state dimensionality or employing maximal padding enables even higher coverage (up to 27%).

Visual inspection of partial and failed solutions exposes reasoning failure modes, such as incomplete generalization or local color misassignment, providing diagnostic insights for architectural refinement.

Figure 10: Input visual for analysis of partial solution failure modes.

Figure 11: Input demonstrating edge case reasoning pitfalls in EngramNCA v4 related to spatial patterning.

Figure 12: Input grid highlighting asynchronous NCA development and single-pixel loss errors.

Discussion and Implications

ARC-NCA demonstrates that developmental computation paradigms, specifically Neural Cellular Automata with embedded memory and regulatory mechanisms, may serve as viable contenders to transformer-based architectures for abstraction and visual reasoning tasks requiring rapid adaptation from limited examples. The program synthesis approach (per-task NCA fine-tuning) offers flexibility and interpretability, with potential synergies when combined with LLM-driven architectural search or error correction.

Strong claims substantiated by the study:

ARC-NCA models can match or exceed the performance of major LLMs (e.g., ChatGPT 4.5) on ARC at a fraction (~1000x less) of the computational cost.
Partial solutions suggest even higher attainable accuracy, highlighting the potential for robust refinement and ensemble strategies.
Developmental modularity (GeneCA/GenePropCA) facilitates the separation of low-level morphogenesis and high-level reasoning, a promising direction for symbolic and compositional learning tasks.

Practical implications encompass scalable, energy-efficient program synthesis for complex reasoning benchmarks, while theoretical implications pertain to the role of self-organized developmental computation in general intelligence.

With the advent of ARC-AGI-2, which evaluates advanced facets such as symbolic interpretation and compositional reasoning, ARC-NCA’s developmental principles may extend toward more challenging domains. The evidence supports further investigation into criticality pre-training, NCA/LLM integration, and latent-space developmental architectures.

Future Directions

Pre-training strategies for developmental models: Criticality-based or latent abstraction pre-training to facilitate transfer and generalization.
LLM/NCA hybrid reasoning: Hierarchical interaction where LLMs guide NCA architecture/hyperparam selection or correct near-complete outputs.
Latent representation NCAs: Tasking EngramNCA or similar models with abstraction at latent levels, enabling compositional and symbolic manipulation.
Stability and reproducibility analyses: Multi-run statistics, robust union ensembles, and official ARC-AGI leaderboard submission for broad validation.

Conclusion

ARC-NCA introduces a principled, computationally efficient developmental framework for ARC reasoning, leveraging the local interaction and emergent pattern formation of Neural Cellular Automata. Comparative results underscore its competitiveness with LLMs given vastly superior resource utilization and adaptive abstraction capabilities. This study motivates further inquiry into developmental computation for artificial general intelligence and highlights its value for visual reasoning benchmarks at the intersection of artificial life and deep learning.

Markdown