Brain-Trained Foundation Models

Updated 24 January 2026

Brain-trained foundation models are large neural networks pre-trained with self-supervised objectives on heterogeneous brain data, capturing intrinsic patterns of brain structure and function.
They employ diverse architectures such as graph transformers, 4D encoders, and multimodal fusion models to improve representation learning and transfer across disorders and tasks.
These models demonstrate strong downstream performance with efficient fine-tuning, prompt-based adaptation, and robust neuroimaging benchmarks.

Brain-trained foundation models are large-scale neural networks trained with self-supervised or weakly supervised objectives directly on heterogeneous neural, neuroimaging, or connectomic datasets. They are designed to capture and generalize intrinsic patterns of brain structure and function, providing a unified foundation for diverse neuroscience and clinical applications. These models leverage recent advances in contrastive learning, masked modeling, graph representations, multi-modal fusion, and prompt-based adaptation to address the high dimensionality, variability, and limited labeling typical of brain data, substantially improving representation learning and transfer across brain atlases, imaging protocols, disorders, and tasks (Zhou et al., 1 Mar 2025, Ghamizi et al., 16 Jun 2025, Wei et al., 31 May 2025, Wang et al., 26 Dec 2025).

1. Definition, Taxonomy, and Foundational Principles

Brain foundation models (BFMs) are formally defined as large neural architectures $f_\theta$ trained on massive, unlabeled neural or neuroimaging datasets $X$ by optimizing large-scale self-supervised objectives: $\theta^* = \arg\min_{\theta}\, \mathcal{L}_{\mathrm{pretrain}}(\theta)$ where $\mathcal{L}_{\mathrm{pretrain}}$ aggregates masked signal modeling, contrastive learning, and, where relevant, multi-modal alignment or autoregressive prediction (Zhou et al., 1 Mar 2025). The resultant neural representations $\mathbf{z}=f_\theta(x)$ are designed to be directly usable for a wide range of downstream inference tasks with only minimal task-specific supervision (linear probing, parameter-efficient tuning, or few-shot/prompting adaptation).

BFMs can be categorized by both architectural design (CNNs, Transformers, GNNs, Mixture-of-Experts, hybrid) and data modality (fMRI, sMRI, EEG, dMRI, MEG), and by their adaptation strategy:

Pretrained-only: Zero-shot or prompt-based inference
Pretrained + fine-tuned: Model weights fully or partially updated for task-specific objectives
Pretrained + interpretability: Embeddings analyzed or projected for neuroscientific discovery (Zhou et al., 1 Mar 2025, Ghamizi et al., 16 Jun 2025)

Key inductive biases include anatomical prior integration (e.g., atlas-based parcellations, functional network aggregation), multi-scale spatiotemporal modeling, and modality-specific normalization or adaptation (e.g., spectral targets for EEG, topology-aware embeddings).

2. Core Model Architectures and Pretraining Objectives

Contemporary brain-trained foundation models employ a diverse set of architectures:

Graph Transformer Models: Models such as BrainGFM represent fMRI as a graph $G=(V,E,X,A)$ where $V$ is a set of ROIs, $A$ the correlation matrix, and $X$ node features. A graph transformer encoder applies self-attention and multi-layer perception to node tokens and atlas/task prompts, with pretraining objectives including graph contrastive InfoNCE and graph masked autoencoding (masking nodes/edges, reconstructing features) (Wei et al., 31 May 2025).
Voxel-wise/Atlas-free 4D Encoders: SLIM-Brain employs a two-stage pipeline—(i) SimMIM-style global sequence extraction with masked autoencoding for coarse window saliency, and (ii) a hierarchical joint embedding predictive architecture (JEPA) for high-resolution 4D voxel-level representation and local structure preservation (Wang et al., 26 Dec 2025).
Contrastive Self-supervised Models (MRI): SimCLR-3D and similar frameworks leverage 3D volume augmentations and InfoNCE contrastive loss on ResNet or ViT backbones, achieving strong generalization on scan-level tasks including Alzheimer’s classification and stroke/age regression (Kaczmarek et al., 12 Sep 2025).
EEG-Specific Architectures: Models such as LaBraM++, Uni-NTFM, and CSBrain introduce domain-aware tokenization (e.g., codebook quantization, decoupled time/frequency streams, cross-scale tokenization), topological embeddings, and sparse protein mixture-of-experts, optimized for masked reconstruction and contrastive objectives in the EEG domain (Barmpas et al., 22 May 2025, Chen et al., 29 Sep 2025, Zhou et al., 29 Jun 2025).
Multimodal and Fusion Models: Brain Harmony integrates structural (T1 MRI) and functional (fMRI) encoders with geometric harmonics for cross-modal alignment, then compresses both modalities into a unified token bottleneck via a harmonizer transformer (Dong et al., 29 Sep 2025). BrainCSD and BrainFM-MRI extend multi-expert or dynamic-modality integration to address missing modalities and trait prediction (Shen et al., 7 Nov 2025, Luu et al., 4 Nov 2025).

Pretraining objectives universally involve: $\mathcal{L}_{\mathrm{pretrain}} = \lambda_{\mathrm{MSM}}\,\mathcal{L}_{\mathrm{MSM}} + \lambda_{\mathrm{NCE}}\,\mathcal{L}_{\mathrm{NCE}} + ...$ with $\mathcal{L}_{\mathrm{MSM}}$ (masked signal/modeling) for reconstructive SSL, $\mathcal{L}_{\mathrm{NCE}}$ for contrastive alignment, and where applicable, autoregressive ( $\mathcal{L}_{\mathrm{AR}}$ ), cross-modal ( $\mathcal{L}_{\mathrm{fusion}}$ ), prompt/meta-learning, or regularization terms (e.g., VICReg, coding-rate) (Wei et al., 31 May 2025, Zhou et al., 1 Mar 2025, Barmpas et al., 22 May 2025, Wang et al., 26 Dec 2025, Dong et al., 29 Sep 2025).

3. Pretraining Corpora, Data Representation, and Multimodal Integration

Data scale and heterogeneity are central to BFM effectiveness. Pretraining is performed on large, curated, and diverse datasets:

fMRI: 27 public cohorts, 25k+ subjects, multiple parcellation schemes, 400k+ graph samples (BrainGFM); up to 40k UK Biobank subjects for Brain-Semantoks (Wei et al., 31 May 2025, Gijsen et al., 12 Dec 2025).
Structural MRI: 100k+ volumes, multi-contrast (T1, T2, FLAIR), multiple protocols and sites (SSL3D, FOMO25, BraTS, OASIS) (Gordaliza et al., 19 Jan 2026).
EEG: Up to 28k hours, 17k+ subjects, spanning resting-state, BCI, and clinical paradigms (Chen et al., 29 Sep 2025).
Multimodal: Paired MRI/fMRI, PET, and clinical data in fusion models such as BrainCSD, Brain Harmonix (Shen et al., 7 Nov 2025, Dong et al., 29 Sep 2025).

Unified data representations are achieved by:

Zero-padding ROI graphs to a fixed $N_{\max}$ , with prompt tokens indicating atlas/parcellation (Wei et al., 31 May 2025).
Projecting functional signals into semantic tokens of functional networks (Brain-Semantoks) or ROI/time spatiotemporal embeddings (BrainHarmonix) (Gijsen et al., 12 Dec 2025, Dong et al., 29 Sep 2025).
Latent codebook and topological embedding for EEG/MEG channel alignment (LaBraM++, Uni-NTFM) (Barmpas et al., 22 May 2025, Chen et al., 29 Sep 2025).
Explicit multimodal-aligned latent spaces via joint autoencoding, cross-modal masking, or prompt-fusion (Dong et al., 29 Sep 2025, Luu et al., 4 Nov 2025, Shen et al., 7 Nov 2025).

4. Prompt-Based and Meta-Learning Adaptation

Brain-trained FMs frequently integrate advanced adaptation mechanisms:

Graph and Language Prompting: BrainGFM introduces learnable graph prompt matrices and semantic (text-encoded) prompt tokens for task/atlas adaptation. These prompt tokens guide the transfer of the frozen backbone to novel atlases/disorders under few-shot or zero-shot regimes (Wei et al., 31 May 2025).
Meta-learning: MAML-style meta-learning over (disorder, atlas) tasks optimizes prompt parameters for rapid adaptation to unseen tasks:

$\phi \leftarrow \phi - \beta\sum_{i=1}^B \nabla_{\phi} \mathcal{L}_{T_i}^{test}(\theta \text{ fixed}, \phi_i')$

Parameter-efficient transfer: Approaches include freezing the encoder and adapting only prompt, adapter, or classification head layers, leading to rapid convergence and high accuracy even at low label availability (Wei et al., 31 May 2025, Wang et al., 26 Dec 2025, Chen et al., 29 Sep 2025).

5. Empirical Performance, Downstream Tasks, and Transferability

State-of-the-art brain-trained foundation models demonstrate highly competitive, sometimes best-in-class performance on a range of neuroimaging and neural decoding benchmarks:

Task/Model	BrainGFM	SLIM-Brain	BrainHarmonix	Uni-NTFM	CoMET	3D-SimCLR
fMRI disorder ACC	72–75% (10% data)	63.5–69.1%	up to 70%	–	–	–
EEG BCI/Clinical	–	–	–	0.784–0.699	62.75–92.74%	–
MRI segmentation	–	–	–	–	–	Dice 0.9115
Zero-shot	69–72% held-out	–	–	–	–	AUC >0.92 (AD)
Few-shot	~68% single-shot	–	–	–	–	>0.88 Dice (MRI)

BrainGFM achieves notably high accuracy (69–72%) and AUC (0.72–0.75) in zero-shot disorder classification on completely held-out datasets, outperforming both ROI/time-series baselines and naive pretrained GNNs by substantial margins (Wei et al., 31 May 2025). SLIM-Brain matches or outperforms larger, less efficient voxel-level models while using only ~30% of the computational resources and 10–20 $\times$ less pretraining data (Wang et al., 26 Dec 2025). For EEG, Uni-NTFM and CoMET demonstrate monotonic performance scaling with model size and universal transfer across nine BCI, clinical, and psychiatric tasks, with Uni-NTFM $_{large}$ achieving strong performance without modification (Chen et al., 29 Sep 2025, Li et al., 30 Aug 2025).

Label efficiency is a defining attribute. Models such as BrainGFM and SimCLR-3D show minimal drops under few-shot regimes (1%–10% of labeled samples), and can perform competitively using prompt-based linear probes alone (Wei et al., 31 May 2025, Kaczmarek et al., 12 Sep 2025).

6. Inductive Biases, Interpretability, and Clinical Adaptability

The most effective brain-trained FMs explicitly encode neuroscientific priors:

Connectivity-driven (graph) encoders leveraging anatomical atlases (Wei et al., 31 May 2025, Shen et al., 7 Nov 2025)
Functional network aggregation and tokenization (Brain-Semantoks) (Gijsen et al., 12 Dec 2025)
Geometric harmonics and Laplace–Beltrami alignment (BrainHarmonix) (Dong et al., 29 Sep 2025)
Multi-scale spatiotemporal architectures for EEG (CSBrain, Uni-NTFM) (Zhou et al., 29 Jun 2025, Chen et al., 29 Sep 2025)

Interpretability is being advanced by explicit mapping of self-attention or prompt tokens to clinical biomarkers or network hubs, as well as by development of coding-rate penalties and diversity-promoting losses that prevent representational collapse (Wei et al., 31 May 2025, Gijsen et al., 12 Dec 2025).

Clinical adaptability is enhanced by modular architectures and cross-modal interface components (dynamic adapters, hypergraph fusion, conditional normalization), supporting robust performance across missing modalities and unseen protocols (Luu et al., 4 Nov 2025, Deng et al., 1 May 2025).

7. Open Challenges and Future Directions

Ongoing challenges for brain-trained foundation models include:

Multi-modal integration: Extending foundation models to unify structural/functional imaging, electrophysiology, and clinical metadata via aligned contrastive or fusion objectives (Wei et al., 31 May 2025, Shen et al., 7 Nov 2025).
Scaling laws and lifelong learning: Detailed exploration of scaling in model/data size, domain shifts (new scanners, protocols), and lifelong adaptation without catastrophic forgetting (Zhou et al., 1 Mar 2025, Ghamizi et al., 16 Jun 2025).
Interpretability and neuro-symbolic constraints: Developing mechanisms to relate neural representations to interpretable brain maps, and to enforce anatomical/functional network consistency.
Domain robustness: Methods such as prompt/meta-learning, domain-aware augmentations, and anatomical regularization to ensure stable out-of-distribution transfer (Gordaliza et al., 19 Jan 2026, Wei et al., 31 May 2025).
Ethical, privacy, and data governance: Safeguarding sensitive brain-derived data and addressing demographic and cross-site bias (Zhou et al., 1 Mar 2025, Ghamizi et al., 16 Jun 2025).

Foundational research is also exploring direct integration of human neuroimaging signals as supervisory or reward signals during foundation model training, representing an emerging path towards integrating brain-level cognition into future general-purpose AI development (Donoso, 17 Jan 2026).

References:

(Zhou et al., 1 Mar 2025) Brain Foundation Models: A Survey on Advancements in Neural Signal Processing and Brain Discovery (Wei et al., 31 May 2025) A Brain Graph Foundation Model: Pre-Training and Prompt-Tuning for Any Atlas and Disorder (Wang et al., 26 Dec 2025) SLIM-Brain: A Data- and Training-Efficient Foundation Model for fMRI Data Analysis (Kaczmarek et al., 12 Sep 2025) Building a General SimCLR Self-Supervised Foundation Model Across Neurological Diseases to Advance 3D Brain MRI Diagnoses (Gordaliza et al., 19 Jan 2026) From 100,000+ images to winning the first brain MRI foundation model challenges: Sharing lessons and models (Barmpas et al., 22 May 2025) Advancing Brainwave Modeling with a Codebook-Based Foundation Model (Chen et al., 29 Sep 2025) Uni-NTFM: A Unified Foundation Model for EEG Signal Representation Learning (Zhou et al., 29 Jun 2025) CSBrain: A Cross-scale Spatiotemporal Brain Foundation Model for EEG Decoding (Gijsen et al., 12 Dec 2025) Brain-Semantoks: Learning Semantic Tokens of Brain Dynamics with a Self-Distilled Foundation Model (Ghamizi et al., 16 Jun 2025) Brain Imaging Foundation Models, Are We There Yet? A Systematic Review of Foundation Models for Brain Imaging and Biomedical Research (Luu et al., 4 Nov 2025) A Foundation Model for Brain MRI with Dynamic Modality Integration (Shen et al., 7 Nov 2025) BrainCSD: A Hierarchical Consistency-Driven MoE Foundation Model for Unified Connectome Synthesis and Multitask Brain Trait Prediction (Deng et al., 1 May 2025) Brain Foundation Models with Hypergraph Dynamic Adapter for Brain Disease Analysis (Dong et al., 29 Sep 2025) Brain Harmony: A Multimodal Foundation Model Unifying Morphology and Function into 1D Tokens (Donoso, 17 Jan 2026) A New Strategy for Artificial Intelligence: Training Foundation Models Directly on Human Brain Data