SuperARC: An Agnostic Test for Narrow, General, and Super Intelligence Based On the Principles of Recursive Compression and Algorithmic Probability

Published 20 Mar 2025 in cs.AI and cs.IT | (2503.16743v4)

Abstract: We introduce an open-ended test grounded in algorithmic probability that can avoid benchmark contamination in the quantitative evaluation of frontier models in the context of their AGI and Superintelligence (ASI) claims. Unlike other tests, this test does not rely on statistical compression methods (such as GZIP or LZW), which are more closely related to Shannon entropy than to Kolmogorov complexity and are not able to test beyond simple pattern matching. The test challenges aspects of AI, in particular LLMs, related to features of intelligence of fundamental nature such as synthesis and model creation in the context of inverse problems (generating new knowledge from observation). We argue that metrics based on model abstraction and abduction (optimal Bayesian inference') for predictiveplanning' can provide a robust framework for testing intelligence, including natural intelligence (human and animal), narrow AI, AGI, and ASI. We found that LLM model versions tend to be fragile and incremental as a result of memorisation only with progress likely driven by the size of training data. The results were compared with a hybrid neurosymbolic approach that theoretically guarantees universal intelligence based on the principles of algorithmic probability and Kolmogorov complexity. The method outperforms LLMs in a proof-of-concept on short binary sequences. We prove that compression is equivalent and directly proportional to a system's predictive power and vice versa. That is, if a system can better predict it can better compress, and if it can better compress, then it can better predict. Our findings strengthen the suspicion regarding the fundamental limitations of LLMs, exposing them as systems optimised for the perception of mastery over human language.

Abstract PDF Upgrade to Chat

Summary

The paper presents a novel intelligence test that leverages recursive compression and algorithmic probability to evaluate various forms of AI.
It critiques traditional LLM evaluations by emphasizing synthesis and causal inference rather than mere memorization.
The framework employs methods like the Block Decomposition Method, offering a robust approach for assessing AGI and ASI capabilities.

Introduction

The paper "SuperARC: An Agnostic Test for Narrow, General, and Super Intelligence Based On the Principles of Recursive Compression and Algorithmic Probability" (2503.16743) introduces a test grounded in algorithmic principles to evaluate artificial intelligence models, including those claiming AGI and ASI capabilities. This work critically assesses current AI models like LLMs by exploring their core characteristics related to intelligence through algorithmic complexity. A specific focus is placed on the fragile and incremental nature of LLMs driven by memorization rather than comprehension or synthesis.

The test relies on non-statistical measures like Kolmogorov-Chaitin complexity and algorithmic probability, diverging from traditional entropy measures. It evaluates machine intelligence across various forms, from narrow AI to ASI, emphasizing synthesis and model creation capabilities. The test aims to offer a robust framework for understanding the intelligence and capabilities of AI models beyond pattern matching and memorization.

Intelligence and Compression

The paper discusses how LLMs are evaluated through their ability to compress and simulate data in a multidimensional tensor probability distribution, establishing a connection between compression and comprehension. As models compress representations of real-world phenomena, they gain insights into the phenomena's complexity. Recursive compression implies an accurate depiction of targeting sequences, which aligns with enhanced predictive capabilities.

Compression enables models to abstract main features and enable predictive modeling and planning. Such modeling benefits from a recursive approach that goes beyond mere statistical pattern matching, creating causality-driven hypotheses. A system capable of decompression, thereby reconstructing initial data, demonstrates comprehension levels that hint at intelligence. This theoretical foundation for predicting and planning draws from algorithmic randomness, underscoring the robustness of such a framework in evaluating AI.

Assessing LLM Capabilities

The paper explores current LLM benchmarks, critiquing their limitations in testing intelligence. These systems predominantly rely on memorization and statistical regularities due to training set constraints, rather than displaying a deeper understanding or comprehension. The paper discusses limitations in existing LLM models for comprehension tests - highlighting statistical dependencies and lack of reasoning capabilities as key barriers to achieving genuine intelligence evaluations.

The framework proposed demonstrates a more objective characterization of comprehension and intelligence beyond training set-specific answers. It emphasizes compression through algorithmic complexity over statistical pattern recognition. The study suggests combining complexity-based tests with non-binary and non-fixed datasets to prevent cheating and contamination from fixed benchmarks. This approach anticipates challenges and improves upon existing limitations of LLM models, demonstrating how a comprehensive understanding and implementation of algorithmic information theory principles can propel the field toward AGI/ASI.

Framework Description

The paper introduces the SuperARC testing framework, designed to capture model abstraction and planning by evaluating recursive compression ability and predictive power. It demonstrates the application using the Block Decomposition Method (BDM) for coding and synthesizing models. These methods integrate classical and algorithmic information theory principles to provide a neurosymbolic approach that combines statistical and symbolic methods effectively.

BDM employs a more robust calculation than statistical compressions like ZIP or LZW, harnessing algorithmic probability principles and providing results based on comprehensive causal and inference pathways. The paper advocates for CTM and BDM, using a hybrid neurosymbolic method and embedding those elements into practical frameworks like Shannon Entropy, yielding a holistic and actionable approach.

Conclusion

The study provides compelling evidence for revising intelligence tests for AI models through algorithmic principles rather than statistical methods alone, pushing boundaries toward AGI and ASI. It casts doubts on current LLMs' sporadic capabilities, presenting opportunities for models to advance through methodical recursive compression. SuperARC's framework, centered around optimal induction and algorithmic information theory, promises improved planning and prediction paradigms, geared toward achieving more intelligent AI systems without relying heavily on memorization.

The paper encourages embracing algorithmic information theory as the foundation for developing authentic intelligence tests to bridge gaps between current capabilities and targeted superintelligence goals, addressing key limitations in LLM training and adaptation. Future AI developments using SuperARC principles could lead to impactful advancements across diverse cognitive and computational tasks.

Markdown