Brain-Aligned Representations

Updated 8 February 2026

Brain-Aligned Representations are activation patterns in ANNs that mirror human neural responses to similar stimuli.
They are quantified using metrics like CKA, RSA, and encoding models, linking model dynamics to brain activity.
These representations enhance model interpretability and are applied across modalities to inform neural decoding and BCI development.

Brain-aligned representations are internal activation patterns within artificial neural networks (ANNs) that closely mirror the distributed patterns of neural responses measured in the human brain as it processes analogous stimuli. This correspondence is established through explicit comparison of the geometry or similarity structure of model activations and neural activity patterns, typically in response to the same input set. Brain alignment is now a critical concept at the intersection of AI, computational neuroscience, and cognitive science, and is foundational for both mechanistic understanding of natural intelligence and for the design of more capable, interpretable artificial systems (Shen et al., 18 Jun 2025).

1. Definition and Measurement of Brain Alignment

Brain–AI alignment is formally defined as the similarity between the representational structure of neural activity in the brain and that of internal features within an ANN when presented with corresponding stimuli. The prevailing metric for quantifying this alignment is Centered Kernel Alignment (CKA), which is robust to differences in dimensionality and invariant to orthogonal transformations of the data (Shen et al., 18 Jun 2025). Let $X \in \mathbb{R}^{n \times p}$ be the model activation matrix (n stimuli × p units), and $Y \in \mathbb{R}^{n \times q}$ the corresponding fMRI activity matrix (n stimuli × q voxels). The representational similarity matrices (RSMs) are computed with RBF kernels

$K_{ij} = \exp\Big(-\frac{\|x_i - x_j\|^2}{2\sigma^2}\Big), \qquad L_{ij} = \exp\Big(-\frac{\|y_i - y_j\|^2}{2\sigma^2}\Big)$

CKA is then

$\mathrm{CKA}(K, L) = \frac{\mathrm{HSIC}(K, L)}{\sqrt{\mathrm{HSIC}(K, K) \cdot \mathrm{HSIC}(L, L)}}$

where $\mathrm{HSIC}$ denotes the Hilbert-Schmidt Independence Criterion. In other paradigms, representational similarity analysis (RSA) computes the rank correlation between RSMs derived from model and neural data (Doerig et al., 2022, Pepino et al., 20 Nov 2025).

Alternative encoding models employ regression (e.g., ridge regression) to map from model features to brain responses, measuring the Pearson correlation between predicted and observed voxel activity (Raugel et al., 1 Dec 2025, Pepino et al., 20 Nov 2025). Alignment is sometimes reported as the correlation between CKA/RSA/encoding-model fit and behavioral or task-performance metrics across many model variants (Shen et al., 18 Jun 2025, Pepino et al., 20 Nov 2025).

2. Empirical Evidence Across Modalities and Architectures

Large-scale studies of brain alignment encompass language, vision, audio, and multimodal models, spanning parameter scales from 10⁶ to 10¹¹ (Shen et al., 18 Jun 2025, Pepino et al., 20 Nov 2025, Tang et al., 2023). In the language domain, alignment scores track composite LLM benchmarks ( Pearson $r=0.89$ , $p<7.5\!\times\!10^{-13}$ ) (Shen et al., 18 Jun 2025). Vision models display positive, though generally lower, correlation with performance (Vision: $r=0.53$ , $p<2.0\!\times\!10^{-44}$ ). The relationship between brain alignment and performance exhibits logarithmic “diminishing returns” dynamics, with gains in alignment preceding increases in benchmark accuracy during training (Shen et al., 18 Jun 2025).

Longitudinal training analyses demonstrate that alignment increases quickly in the early phases of optimization—reaching 85% of final values after 20% of data exposure for MixNet vision models, well before corresponding gains in ImageNet Top-1 accuracy; similar effects are observed across diverse families and domains (Shen et al., 18 Jun 2025, Pepino et al., 20 Nov 2025). In the auditory domain, self-supervised audio models with stronger performance on music, speech, and environmental benchmarks are more predictive of auditory cortex activity, with $r>0.7$ for the relationship between task performance and brain alignment (Pepino et al., 20 Nov 2025).

Importantly, this convergence is not strictly dependent on a particular architecture: transformer LLMs, recurrent nets, and state-space models all display similar alignment trajectories as scale and context depth increase (Raugel et al., 1 Dec 2025). Residual connections and self-attention are noted as inductive biases favoring stronger brain similarity (Shen et al., 18 Jun 2025).

3. Spatial, Hierarchical, and Temporal Patterns of Alignment

Spatial alignment occurs at multiple scales and cortical regions:

Vision models: Shallow layers align best with early visual cortices (V1/V2/V3); alignment peaks monotonically as layers deepen, ultimately favoring higher-order visual parcels.
LLMs: Maximal alignment is observed in mid-depth layers (normalized depth 3/8–5/8), especially with limbic, default-mode, and integrative semantic regions.
Multi-scale analysis: Small kernel scales ( $\sigma=28$ ) maximize alignment in primary sensory areas, while large scales ( $\sigma=68$ ) enhance similarity to association/limbic networks—indicative of a posterior-to-anterior functional gradient across cortex (Shen et al., 18 Jun 2025).
Temporal alignment: In the auditory and language domains, initial model layers best align with early processing intervals (MEG $\approx$ 0.4s), whereas deeper layers correspond to late-stage comprehension ($0.8$–$1$s post-onset), with layer–latency correlation $r_{temp}=0.99$ (Raugel et al., 1 Dec 2025).

A consistent finding is the peak of alignment in intermediate model layers—those that perform the maximal “meaning abstraction” as measured by intrinsic dimension or semantic probe tasks (Cheng et al., 3 Feb 2026). This peak in intrinsic dimension precedes and predicts optimal alignment with fMRI/ECoG data, independent of the model output's minimal prediction error (Cheng et al., 3 Feb 2026).

4. Mechanisms and Drivers: Semantic Abstraction, Convergent Evolution, and Shared Structure

The emergence of brain-aligned representations is attributed to convergent evolution: distinct artificial and biological systems, optimized for behavioral or communicative tasks, gravitate toward equivalent computational solutions despite fundamentally different substrates. Alignment consistently appears in the absence of any explicit neurobiological regularization and arises from pure performance-driven optimization (Shen et al., 18 Jun 2025).

Semantic abstraction—defined as the construction of higher-order, context-dependent features—drives alignment rather than output-layer predictive coding per se. Intrinsic dimension analysis reveals that layers with the maximal complexity and semantic content show the strongest neural predictivity (Cheng et al., 3 Feb 2026). This is observed both over the training trajectory and under direct brain-supervised finetuning, where increasing neural alignment causally raises intrinsic dimension and semantic representational richness (Cheng et al., 3 Feb 2026).

Alignment also depends on scale (model size) and memory (context length): only sufficiently large models processing long, realistic contexts approach the sequential decomposition of information reflected in the brain (Raugel et al., 1 Dec 2025). Shared semantic structure can be explicitly transferred across modalities and models, as demonstrated by multimodal transformers and alignment-tuned diffusion models (Tang et al., 2023, Zangos et al., 3 May 2025).

5. Methodological Innovations in Brain Alignment

Recent advances extend the scope and precision of alignment:

Brain tuning / multi-brain tuning: Fine-tuning models to jointly predict fMRI/EEG responses from multiple subjects leads to participant-agnostic, robust representations that improve both alignment scores (up to +50%) and data efficiency (requiring $\sim$ 20% of the data for maximal performance) (Moussa et al., 24 Oct 2025).
Brain-aligned semantic space learning: Fine-tuning pretrained semantic vectors (CLIP, GloVe, etc.) with an explicit loss to match their pairwise geometry to measured fMRI RSMs yields representations that dramatically increase zero-shot neural decoding accuracy (up to +40%) across fMRI, MEG, and ECoG (Vafaei et al., 2024).
Multi-modal fusion and cross-subject alignment: Entropic-regularized optimal transport (FUGW) aligns cortical representations across subjects, which in turn improves transfer decoding accuracy by up to 75% (Thual et al., 2023). Lightweight subject-specific adapters tuned to a reference brain yield a universal “common brain” space, supporting subject- and dataset-agnostic reconstruction (Zangos et al., 3 May 2025).
Supervised brain-aligned vision models: Multi-layer encoding heads attached to standard DCNNs (e.g., CORnet-S) and trained with fMRI or EEG targets produce internal representations with broader, more human-like correspondence to ventral stream stages and object category structure (Lu et al., 2024, Lu et al., 2024).
Self-supervised, biologically plausible objectives: Glimpse Prediction Networks, trained to anticipate the next eye-fixated region’s features, learn scene embeddings that outperform object-classification or captioning objectives in predicting mid/high-level visual cortex fMRI patterns (Thorat et al., 16 Nov 2025).

6. Theoretical and Functional Implications

The observed convergence between ANNs and brain representations suggests fundamental principles governing intelligent information processing:

Emergent convergence: Behavioral task optimization drives the development of neural-like representational scaffolding, potentially as a necessary precursor to higher cognitive capability (Shen et al., 18 Jun 2025).
Representational geometry: Shared geometric structure enables robust linear decoding and cross-modal/exemplar generalization, supporting the hypothesis that the cortex encodes modality-invariant semantic “axes” (e.g., sociality, animacy, abstraction) (Tang et al., 2023, Ryskina et al., 15 Aug 2025).
Dynamic and hierarchical coding: The brain–ANN mapping is hierarchical and temporally ordered, with explicit correspondence between processing stages, model depths, and chronological development of features (Raugel et al., 1 Dec 2025, Cheng et al., 3 Feb 2026).
Interpretability and utility: Brain-aligned representations provide a powerful substrate for generative reconstruction, zero-shot BCI, robust clinical decoding, and explanation of neuropsychological behavior, with demonstrated benefits for behavioral lineup and high-fidelity brain–AI interfaces (Feng et al., 6 Nov 2025, Rajabi et al., 5 Feb 2025, Kayser et al., 21 Dec 2025, Li et al., 13 Jul 2025).

A plausible implication is that future models for neuro-AI may benefit from training objectives and architectures explicitly favoring high-dimensional, context-dependent, semantically rich representations that closely match the representational geometry of neural populations.

7. Limitations, Open Challenges, and Future Directions

Despite rapid progress, several technical and theoretical challenges remain:

Specificity of alignment metrics: While CKA, RSA, and encoding model scores are established, the search for metrics most predictive of functional equivalence continues (Raugel et al., 1 Dec 2025).
Spatial and temporal resolution: Disentangling fine-grained neuroanatomical mapping and rapid temporal dynamics (e.g., via intracranial ECoG or concurrent EEG/MEG-fMRI) is an active area of investigation (Feng et al., 6 Nov 2025, Raugel et al., 1 Dec 2025).
Cross-modal and developmental generalization: Probing alignment across sensory modalities, developmental timelines, and under systematic perturbation (lesioning/ablation) will further clarify the limits and causality of convergence (Raugel et al., 1 Dec 2025, Cheng et al., 3 Feb 2026).
Individual variability and generalizability: Establishing models and alignment procedures that are robust to inter-individual differences, limited data regimes, and clinical neurodiversity remains a chief concern (Thual et al., 2023, Zangos et al., 3 May 2025, Feng et al., 6 Nov 2025).
Causal interpretation: Distinguishing whether alignment is a direct driver of intelligence or a byproduct of shared statistical structure and inductive bias remains an open problem (Cheng et al., 3 Feb 2026).

Continued integration of large-scale multi-modal neural data, advanced alignment frameworks, and task-theoretic analyses are expected to deepen mechanistic understanding of intelligence and support neurophysiologically interpretable artificial systems.