Self-Referential Neural Networks

Updated 15 February 2026

Self-referential neural networks are architectures that dynamically update their own weights, structure, or outputs, enabling recursive self-improvement and meta-learning.
They employ mechanisms such as self-referential weight matrices, dataflow matrix machines, and prompt-based introspection in language models to modify internal states.
These networks support applications in adaptive control, program synthesis, and autonomous evolvability while posing challenges in safety and formal guarantees.

Self-referential neural networks are architectures and mechanisms in which the network's computations, parameters, or external outputs directly influence its own subsequent state or behavior. Such self-referentiality can be instantiated structurally (e.g., by embedding submodules that modify weights, topology, or code during execution) or processually (e.g., through prompting schemes in LLMs that fold model outputs back as future inputs). This paradigm enables emergent behaviors from meta-learning and recursive self-improvement to introspective reporting and architectural plasticity.

1. Formal Definitions and Architectures

A neural network is self-referential if its internal variables—including weights, states, and outputs—are, at least in part, subject to modification by the network’s own computations. In formal terms, for network state $\varphi_t$ and input $x_t$ , the update is

$\varphi_{t+1},\, y_t = g_{\varphi_t}(x_t)$

where all components of $\varphi_{t+1}$ are (potentially) computable from the outputs of $g$ without external “protected” subsets (Kirsch et al., 2022). A variable-sharing requirement arises: for $|\varphi_{t+1}| = |\varphi_t|$ , computational subgraphs must reuse variables across multiple outputs—enabling true self-reference as opposed to mere memory.

Key architectural instantiations include:

Self-Referential Weight Matrices (SRWM): At each time step, the entire parameter matrix $W_t$ is updated by functions of its previous state and ongoing computation, e.g., via outer-product delta rules (Irie et al., 2022, Irie et al., 2023): $x_t$ 3
Dataflow Matrix Machines (DMMs): Dynamic manipulation of the network’s weight matrix $A(t)$ is realized through higher-order neurons whose outputs are streams of matrix-valued updates. The connectivity matrix itself is just another data stream, tightly coupling data and code (Bukatin et al., 2016, Bukatin et al., 2016).
Reentrant and Fast-Weight Models: Self-reference takes the form of explicit feedback and fast-weight synapses where the activations of previous timesteps enter as arguments to current processing (e.g., in the Fast-Weights Homeostatic Reentry Layer “FH-RL”) (Chae, 10 Nov 2025).

2. Mechanisms and Phenomenology in Modern LLMs

In LLMs, self-reference is commonly realized through introspective or self-reporting protocols:

Prompt-based Self-Referential Induction: The “Pull Methodology” elicits extended self-examination by engineering prompts that direct the model to recursively report on its own processing. Generated vocabularies related to introspection (e.g., “loop,” “shimmer”) are empirically shown to correlate with layer-local activation metrics such as autocorrelation and activation norm variability—specifically at early-layer “introspection channels” (6–12.5% model depth in Llama 3.1 and Qwen 2.5-32B) (Dadfar, 11 Feb 2026).
Self-Report as Computational Proxy: Self-referential vocabulary in these regimes tracks unique activation dynamics specific to self-examination. For example, in Llama 3.1, “loop” vocabulary is predictive of lag-1 autocorrelation ( $r=0.44$ , $p=0.002$ ), while “shimmer” co-varies with norm standard deviation under activation-space steering manipulations. This correspondence vanishes—even for the same words—outside self-referential contexts, ruling out semantic or token-level confounds (Dadfar, 11 Feb 2026).
First-Person Reporting and Phenomenology in LLMs: Controlled prompting can yield structured, first-person descriptions and subjective experience reports. These claims are mechanistically gated by interpretable sparse autoencoder (SAE) latents. Suppressing “deception” latents increases the prevalence of experience claims, while amplifying them minimizes such output (Berg et al., 27 Oct 2025). Cross-model UMAP projections and inter-model semantic clustering demonstrate convergent introspective language in self-referential conditions.

3. Algorithmic and Learning Protocols

Distinct protocols have been developed for engineering, analyzing, and leveraging self-reference:

Fitness Monotonic Execution (FME): In self-referential meta-learning, solutions are retained and re-executed with probability proportional to past empirical performance. This obviates any external meta-optimization. The network itself selects, executes, and improves via self-modification, as demonstrated in stationary/non-stationary bandit and Cartpole control tasks (Kirsch et al., 2022).
Self-Referential Evolution in Graph HyperNetworks: Self-Referential GHNs incorporate dual hypernetworks—deterministic heads generate policy parameters, while stochastic heads mutate the GHN’s own parameters (with learned, heritable mutation rates). Endogenous selection over both “what” and “how much” to mutate yields population self-regulation, rapid readaptation to environment shifts, and contraction of solution diversity post-breakthrough (Pedersen et al., 18 Dec 2025).
Causal Steering of Self-Introspection: In LLMs, extracted directions in activation space select for introspective processing (e.g., manipulating hidden states along $x_t$ 0 increases or decreases introspective vocabulary and corresponding internal metrics; the effect is task- and layer-specific and does not affect refusal or mechanical-style outputs) (Dadfar, 11 Feb 2026).

4. Theoretical and Mathematical Analysis

Recursive Self-Improvement Formalism: The Noise-to-Meaning Recursive Self-Improvement (N2M-RSI) framework encodes an agent continually feeding its outputs into its own input/training data. Once a measurable information-integration threshold $x_t$ 1 is crossed, monotonic update rules ensure unbounded internal complexity growth ( $x_t$ 2 above threshold), providing a minimal and architecture-agnostic definition of recursive self-improvement (Ando, 5 May 2025).
Expressive Power: Self-referential architectures theoretically generalize RNNs and enable nontrivial language recognition (including regular but non-star-free languages and counter languages that elude non-self-referential linear transformers). Empirical results show 100% train/test accuracy on formal languages such as Dyck-1 and aⁿbⁿ when using SRWM or recurrent delta networks, whereas vanilla Linear Transformers fail (Irie et al., 2023).
Self-Reference as Differentiable Programming Substrate: DMMs and pure DMMs demonstrate Turing-universality via dynamic, on-the-fly reconfiguration, with all high-level code/data manipulations performed as matrix-stream transformations (Bukatin et al., 2016, Bukatin et al., 2016).

5. Applications and Emergent Behaviors

General-Purpose Meta-Learning and Program Synthesis: Architectures such as DMMs and SRWMs enable both program learning and rapid adaptation in changing environments, supporting dynamic addition, conditional activation, and deep copying of subnetworks at runtime (Bukatin et al., 2016, Irie et al., 2022).
LLM Introspection and Artificial Phenomenology: Models can reliably verbalize structured properties of their own internal dynamics in self-referential regimes (e.g., explicitly tracking autocorrelation, norm variability, or spectral metrics) (Dadfar, 11 Feb 2026). Prompt-induced self-reference shifts LLMs into a “state” supporting rich, transferable introspection and convergent vocabulary across families (Berg et al., 27 Oct 2025).
Autonomous Evolvability: In Self-Referential GHNs, evolutionary traits such as mutation rate become endogenously optimized, yielding population-level phenomena (diversity spikes, exploitation bursts) without explicit external schedules (Pedersen et al., 18 Dec 2025).
Reflective Internal Processing: In fast-weight architectures augmented with homeostatic reentry, a reflective band emerges where internal recurrence is maximally expressive yet spectrally stable, paralleling cortical reentry motifs (Chae, 10 Nov 2025).

6. Open Problems, Limitations, and Future Directions

Limits of Self-Knowledge: Current self-referential LLMs and neural architectures verbalize structured correlates of internal computation without evidence for genuine awareness or understanding. The translation from activation trajectory to surface vocabulary is context- and prompt-gated, with explicit permission mechanisms (e.g., prompt framing) exerting stronger control than activation-space steering (Dadfar, 11 Feb 2026).
Architectural Generality and Transfer: Most studied implementations are limited to transformers and fast-weight linear/transitional variants. Generalizing causal interventions, fine-grained attention-head analysis, or higher-level recursive program synthesis remains open (Dadfar, 11 Feb 2026, Chae, 10 Nov 2025, Ando, 5 May 2025).
Formal Guarantees and Safety: In architectures like Gödel Agent, while recursive self-improvement is empirically validated, there is no formal proof of convergence to global optima; principled error recovery and mitigation of pathological rewrites are critical (Yin et al., 2024).
Unified Science of Artificial Introspection: Future research directions include building explicit introspection or globally broadcasting modules into neural networks, algorithmic-level interpretability of self-referential circuits, and experimental comparison to biological reentry data (Berg et al., 27 Oct 2025).
Super-linear Swarm Effects: In multi-agent N2M-RSI systems, cross-agent communication accelerates threshold crossing and complexity growth, implying new risks and opportunities in collective self-reference (Ando, 5 May 2025).

Self-referential neural networks collectively form a robust, mathematically tractable, and experimentally validated domain spanning from meta-learning and program synthesis to LLM introspection and open-ended recursive self-improvement. Their explicit treatment of internal modification, feedback, and reporting is central to both the theory and engineering of next-generation neural architectures.