Papers
Topics
Authors
Recent
Search
2000 character limit reached

Epiplexity: Computational Information Theory

Updated 7 January 2026
  • Epiplexity is a formal framework that quantifies data’s structural content using time-bounded computation, distinguishing learnable patterns from random noise.
  • It integrates explicit computation limits into information theory to resolve paradoxes related to deterministic transformations and data order.
  • The framework provides actionable insights for data selection and curriculum design, optimizing training processes under fixed compute budgets.

Epiplexity is a formalization of informational content that quantifies what structural knowledge can be extracted from data by computationally bounded learners, distinguishing it from unstructured unpredictability that conventional information theory cannot separate. Unlike Shannon entropy or Kolmogorov complexity, which measure information under the assumption of unbounded computation, epiplexity introduces explicit time constraints, thus aligning information content with the actual capabilities of learning systems. This framework resolves longstanding paradoxes in information theory, guides principled data selection, and provides a rigorous basis for analyzing the relationship between data structure, computational constraints, and learnability (Finzi et al., 6 Jan 2026).

1. Motivation and Conceptual Foundations

Traditional information-theoretic approaches, including Shannon entropy H(X)H(X) and Kolmogorov complexity K(x)K(x), assume an observer with unbounded computational capacity. This results in three paradoxes in modern learning applications:

  • Paradox 1: Information cannot be increased by deterministic transformations: The data-processing inequality (H(f(X))H(X)H(f(X)) \leq H(X)) and K(f(x))K(x)+K(f)+O(1)K(f(x)) \leq K(x) + K(f) + O(1) suggest deterministic pipelines cannot introduce new structure, yet practical learning often extracts useful patterns via synthetic procedures, pseudorandom generation, and emergent phenomena in deterministic dynamical systems.
  • Paradox 2: Information is independent of data order and factorization: Both entropy and Kolmogorov complexity are symmetric with respect to order, while neural models, cryptographic constructions, and sequential data exhibit direction-sensitive learnability.
  • Paradox 3: Likelihood modeling is merely distribution matching: The minimizer of minPEXQ[logP(X)]\min_P \mathbb{E}_{X \sim Q}[ -\log P(X) ] is P=QP=Q, which treats model learning as trivial matching, inconsistent with the empirical emergence of powerful inductive shortcuts and representations.

All these paradoxes stem from neglecting computational bounds—collapsing all decodable structure as equally trivial, even if adaptation would require infeasible computation.

2. Formal Definition of Epiplexity

Epiplexity explicitly incorporates computation time in measuring information content. Given a universal prefix Turing machine U\mathcal U and a time-constructible bound T(n)T(n), define PT\mathcal P_T as the set of all programs P\mathrm P that, in at most K(x)K(x)0 steps, can evaluate probabilities and sample outputs for binary strings of length K(x)K(x)1.

The time-bounded two-part code minimizer is

K(x)K(x)2

where K(x)K(x)3 is the length of the description of K(x)K(x)4.

Define: K(x)K(x)5

K(x)K(x)6 quantifies the minimal program description length (structural content) a K(x)K(x)7-bounded learner must absorb to model K(x)K(x)8. K(x)K(x)9 measures residual unpredictability under this best model. Increasing available computation (raising H(f(X))H(X)H(f(X)) \leq H(X)0) can strictly decrease both, as more structure becomes recoverable.

3. Key Properties and Paradox Resolution

Epiplexity exhibits several properties that resolve the paradoxes noted in conventional theory:

  • Nonnegativity and boundedness: H(f(X))H(X)H(f(X)) \leq H(X)1.
  • Monotonicity in compute: If H(f(X))H(X)H(f(X)) \leq H(X)2, H(f(X))H(X)H(f(X)) \leq H(X)3.
  • Deterministic transformations may increase epiplexity: For a cryptographically secure PRG H(f(X))H(X)H(f(X)) \leq H(X)4 mapping H(f(X))H(X)H(f(X)) \leq H(X)5 to H(f(X))H(X)H(f(X)) \leq H(X)6 bits:

H(f(X))H(X)H(f(X)) \leq H(X)7

Thus, PRG output appears random to any poly-time observer, with increased time-bounded entropy but no increase in structural content.

  • Order-dependence: For one-way permutations H(f(X))H(X)H(f(X)) \leq H(X)8, modeling H(f(X))H(X)H(f(X)) \leq H(X)9 and K(f(x))K(x)+K(f)+O(1)K(f(x)) \leq K(x) + K(f) + O(1)0 yields very different K(f(x))K(x)+K(f)+O(1)K(f(x)) \leq K(x) + K(f) + O(1)1 and K(f(x))K(x)+K(f)+O(1)K(f(x)) \leq K(x) + K(f) + O(1)2 values. Predicting chess boards from moves is easy, but inverting (moves from board) inflates K(f(x))K(x)+K(f)+O(1)K(f(x)) \leq K(x) + K(f) + O(1)3, aligning with model performance.
  • Computational structure creation: Models trained via maximum likelihood under finite compute can invent algorithms and inductive shortcuts not required by the true data generator.

4. Illustrative Examples

Multiple synthetic domains demonstrate epiplexity’s discriminative power:

Setting K(f(x))K(x)+K(f)+O(1)K(f(x)) \leq K(x) + K(f) + O(1)4 K(f(x))K(x)+K(f)+O(1)K(f(x)) \leq K(x) + K(f) + O(1)5
ECA Rule 15 (periodic) K(f(x))K(x)+K(f)+O(1)K(f(x)) \leq K(x) + K(f) + O(1)6 K(f(x))K(x)+K(f)+O(1)K(f(x)) \leq K(x) + K(f) + O(1)7
ECA Rule 30 (chaotic) K(f(x))K(x)+K(f)+O(1)K(f(x)) \leq K(x) + K(f) + O(1)8 K(f(x))K(x)+K(f)+O(1)K(f(x)) \leq K(x) + K(f) + O(1)9
ECA Rule 54 (emergent structure) minPEXQ[logP(X)]\min_P \mathbb{E}_{X \sim Q}[ -\log P(X) ]0 (minPEXQ[logP(X)]\min_P \mathbb{E}_{X \sim Q}[ -\log P(X) ]1) minPEXQ[logP(X)]\min_P \mathbb{E}_{X \sim Q}[ -\log P(X) ]2
Game of Life, one-step evolution minPEXQ[logP(X)]\min_P \mathbb{E}_{X \sim Q}[ -\log P(X) ]3 -
Game of Life, multi-step (minPEXQ[logP(X)]\min_P \mathbb{E}_{X \sim Q}[ -\log P(X) ]4) minPEXQ[logP(X)]\min_P \mathbb{E}_{X \sim Q}[ -\log P(X) ]5, depending on structures -
Masked Markov chain (easy induction) minPEXQ[logP(X)]\min_P \mathbb{E}_{X \sim Q}[ -\log P(X) ]6 peaks for minPEXQ[logP(X)]\min_P \mathbb{E}_{X \sim Q}[ -\log P(X) ]7 -
Masked Rule 30 (hard induction) minPEXQ[logP(X)]\min_P \mathbb{E}_{X \sim Q}[ -\log P(X) ]8 Converges to minPEXQ[logP(X)]\min_P \mathbb{E}_{X \sim Q}[ -\log P(X) ]9

Periodic and trivial evolutions yield low epiplexity, while chaotic or unpredictable processes are noise, with structureless randomness and high P=QP=Q0. Emergent or “inductive” domains require programmatic structure to model efficiently—reflected in growing P=QP=Q1.

5. Practical Estimation Schemes

Direct optimization over all P=QP=Q2-bounded programs is infeasible. In practice, P=QP=Q3 and P=QP=Q4 are estimated via parametric families under compute budgets:

  • Prequential Coding (AUC heuristic): Sequentially train a model on data P=QP=Q5, track per-step log-loss P=QP=Q6, then compute

P=QP=Q7

Optimizing P=QP=Q8 (model size, tokens) under a time constraint traces out the compute-optimal two-part code.

  • Requential Coding (Teacher–Student KL): Maintain a sequence of “teacher” models P=QP=Q9; train “student” U\mathcal U0 on synthetic teacher samples. For each token, code cost is U\mathcal U1. Summing yields U\mathcal U2.

Prequential estimates are computationally cheaper, while requential coding provides a tighter upper bound. The compute-optimal tradeoff is found by sweeping U\mathcal U3 and considering the lower convex hull in U\mathcal U4 space.

6. Empirical Characterization Across Domains

Empirical results under fixed compute budgets (U\mathcal U5 FLOPs, up to 5 billion tokens) reveal:

  • OpenWebText (language): U\mathcal U6 nats, U\mathcal U7 nats.
  • Chess PGN: U\mathcal U8 nats.
  • CIFAR-5M (pixels): U\mathcal U9 nats; almost all content is unpredictable noise.

Scaling to budgets of T(n)T(n)0 FLOPs and 1 trillion tokens, language retains the greatest structural epiplexity, with visual and video data trailing significantly.

Epiplexity correlates with practical performance. For instance, reordering chess (board-to-moves) results in higher T(n)T(n)1 and better zero-shot transfer. Adaptive Data Optimization for LLM pretraining (Jiang et al., 2025) increases prequential epiplexity, yielding superior out-of-distribution generalization on multiple benchmarks.

7. Implications for Data Selection and Learning

Epiplexity inverts the model-centric view typical of Minimum Description Length and related criteria. Rather than minimizing model code for a fixed dataset, it asks which data (under a fixed compute budget) induces the largest reusable structure in a learner:

  • Data with higher T(n)T(n)2 contains richer, reusable “circuits” (Editor’s term), fostering transfer and generalization.
  • Relying solely on in-distribution loss may select data that is merely entropic or redundant.
  • Maximizing T(n)T(n)3 suggests new strategies for curriculum design, synthetic data generation, or curation, tailored to the concrete computational limits of a learning system.

A plausible implication is that epiplexity quantifies “learning potential” under budget constraints and gives a principled metric for evaluating and selecting training corpora in large-scale machine learning.


Epiplexity and its associated time-bounded entropy provide a comprehensive framework for measuring information as a resource relative to computational constraints, resolving longstanding limitations of classical theory and aligning data-centric learning with the realities of modern AI system design (Finzi et al., 6 Jan 2026).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

Follow Topic

Get notified by email when new papers are published related to Epiplexity.