Papers
Topics
Authors
Recent
Search
2000 character limit reached

Brain-Trained Foundation Models

Updated 24 January 2026
  • Brain-trained foundation models are large neural networks pre-trained with self-supervised objectives on heterogeneous brain data, capturing intrinsic patterns of brain structure and function.
  • They employ diverse architectures such as graph transformers, 4D encoders, and multimodal fusion models to improve representation learning and transfer across disorders and tasks.
  • These models demonstrate strong downstream performance with efficient fine-tuning, prompt-based adaptation, and robust neuroimaging benchmarks.

Brain-trained foundation models are large-scale neural networks trained with self-supervised or weakly supervised objectives directly on heterogeneous neural, neuroimaging, or connectomic datasets. They are designed to capture and generalize intrinsic patterns of brain structure and function, providing a unified foundation for diverse neuroscience and clinical applications. These models leverage recent advances in contrastive learning, masked modeling, graph representations, multi-modal fusion, and prompt-based adaptation to address the high dimensionality, variability, and limited labeling typical of brain data, substantially improving representation learning and transfer across brain atlases, imaging protocols, disorders, and tasks (Zhou et al., 1 Mar 2025, Ghamizi et al., 16 Jun 2025, Wei et al., 31 May 2025, Wang et al., 26 Dec 2025).

1. Definition, Taxonomy, and Foundational Principles

Brain foundation models (BFMs) are formally defined as large neural architectures fθf_\theta trained on massive, unlabeled neural or neuroimaging datasets XX by optimizing large-scale self-supervised objectives: θ=argminθLpretrain(θ)\theta^* = \arg\min_{\theta}\, \mathcal{L}_{\mathrm{pretrain}}(\theta) where Lpretrain\mathcal{L}_{\mathrm{pretrain}} aggregates masked signal modeling, contrastive learning, and, where relevant, multi-modal alignment or autoregressive prediction (Zhou et al., 1 Mar 2025). The resultant neural representations z=fθ(x)\mathbf{z}=f_\theta(x) are designed to be directly usable for a wide range of downstream inference tasks with only minimal task-specific supervision (linear probing, parameter-efficient tuning, or few-shot/prompting adaptation).

BFMs can be categorized by both architectural design (CNNs, Transformers, GNNs, Mixture-of-Experts, hybrid) and data modality (fMRI, sMRI, EEG, dMRI, MEG), and by their adaptation strategy:

  • Pretrained-only: Zero-shot or prompt-based inference
  • Pretrained + fine-tuned: Model weights fully or partially updated for task-specific objectives
  • Pretrained + interpretability: Embeddings analyzed or projected for neuroscientific discovery (Zhou et al., 1 Mar 2025, Ghamizi et al., 16 Jun 2025)

Key inductive biases include anatomical prior integration (e.g., atlas-based parcellations, functional network aggregation), multi-scale spatiotemporal modeling, and modality-specific normalization or adaptation (e.g., spectral targets for EEG, topology-aware embeddings).

2. Core Model Architectures and Pretraining Objectives

Contemporary brain-trained foundation models employ a diverse set of architectures:

  • Graph Transformer Models: Models such as BrainGFM represent fMRI as a graph G=(V,E,X,A)G=(V,E,X,A) where VV is a set of ROIs, AA the correlation matrix, and XX node features. A graph transformer encoder applies self-attention and multi-layer perception to node tokens and atlas/task prompts, with pretraining objectives including graph contrastive InfoNCE and graph masked autoencoding (masking nodes/edges, reconstructing features) (Wei et al., 31 May 2025).
  • Voxel-wise/Atlas-free 4D Encoders: SLIM-Brain employs a two-stage pipeline—(i) SimMIM-style global sequence extraction with masked autoencoding for coarse window saliency, and (ii) a hierarchical joint embedding predictive architecture (JEPA) for high-resolution 4D voxel-level representation and local structure preservation (Wang et al., 26 Dec 2025).
  • Contrastive Self-supervised Models (MRI): SimCLR-3D and similar frameworks leverage 3D volume augmentations and InfoNCE contrastive loss on ResNet or ViT backbones, achieving strong generalization on scan-level tasks including Alzheimer’s classification and stroke/age regression (Kaczmarek et al., 12 Sep 2025).
  • EEG-Specific Architectures: Models such as LaBraM++, Uni-NTFM, and CSBrain introduce domain-aware tokenization (e.g., codebook quantization, decoupled time/frequency streams, cross-scale tokenization), topological embeddings, and sparse protein mixture-of-experts, optimized for masked reconstruction and contrastive objectives in the EEG domain (Barmpas et al., 22 May 2025, Chen et al., 29 Sep 2025, Zhou et al., 29 Jun 2025).
  • Multimodal and Fusion Models: Brain Harmony integrates structural (T1 MRI) and functional (fMRI) encoders with geometric harmonics for cross-modal alignment, then compresses both modalities into a unified token bottleneck via a harmonizer transformer (Dong et al., 29 Sep 2025). BrainCSD and BrainFM-MRI extend multi-expert or dynamic-modality integration to address missing modalities and trait prediction (Shen et al., 7 Nov 2025, Luu et al., 4 Nov 2025).

Pretraining objectives universally involve: Lpretrain=λMSMLMSM+λNCELNCE+...\mathcal{L}_{\mathrm{pretrain}} = \lambda_{\mathrm{MSM}}\,\mathcal{L}_{\mathrm{MSM}} + \lambda_{\mathrm{NCE}}\,\mathcal{L}_{\mathrm{NCE}} + ... with LMSM\mathcal{L}_{\mathrm{MSM}} (masked signal/modeling) for reconstructive SSL, LNCE\mathcal{L}_{\mathrm{NCE}} for contrastive alignment, and where applicable, autoregressive (LAR\mathcal{L}_{\mathrm{AR}}), cross-modal (Lfusion\mathcal{L}_{\mathrm{fusion}}), prompt/meta-learning, or regularization terms (e.g., VICReg, coding-rate) (Wei et al., 31 May 2025, Zhou et al., 1 Mar 2025, Barmpas et al., 22 May 2025, Wang et al., 26 Dec 2025, Dong et al., 29 Sep 2025).

3. Pretraining Corpora, Data Representation, and Multimodal Integration

Data scale and heterogeneity are central to BFM effectiveness. Pretraining is performed on large, curated, and diverse datasets:

Unified data representations are achieved by:

4. Prompt-Based and Meta-Learning Adaptation

Brain-trained FMs frequently integrate advanced adaptation mechanisms:

  • Graph and Language Prompting: BrainGFM introduces learnable graph prompt matrices and semantic (text-encoded) prompt tokens for task/atlas adaptation. These prompt tokens guide the transfer of the frozen backbone to novel atlases/disorders under few-shot or zero-shot regimes (Wei et al., 31 May 2025).
  • Meta-learning: MAML-style meta-learning over (disorder, atlas) tasks optimizes prompt parameters for rapid adaptation to unseen tasks:

ϕϕβi=1BϕLTitest(θ fixed,ϕi)\phi \leftarrow \phi - \beta\sum_{i=1}^B \nabla_{\phi} \mathcal{L}_{T_i}^{test}(\theta \text{ fixed}, \phi_i')

5. Empirical Performance, Downstream Tasks, and Transferability

State-of-the-art brain-trained foundation models demonstrate highly competitive, sometimes best-in-class performance on a range of neuroimaging and neural decoding benchmarks:

Task/Model BrainGFM SLIM-Brain BrainHarmonix Uni-NTFM CoMET 3D-SimCLR
fMRI disorder ACC 72–75% (10% data) 63.5–69.1% up to 70%
EEG BCI/Clinical 0.784–0.699 62.75–92.74%
MRI segmentation Dice 0.9115
Zero-shot 69–72% held-out AUC >0.92 (AD)
Few-shot ~68% single-shot >0.88 Dice (MRI)

BrainGFM achieves notably high accuracy (69–72%) and AUC (0.72–0.75) in zero-shot disorder classification on completely held-out datasets, outperforming both ROI/time-series baselines and naive pretrained GNNs by substantial margins (Wei et al., 31 May 2025). SLIM-Brain matches or outperforms larger, less efficient voxel-level models while using only ~30% of the computational resources and 10–20×\times less pretraining data (Wang et al., 26 Dec 2025). For EEG, Uni-NTFM and CoMET demonstrate monotonic performance scaling with model size and universal transfer across nine BCI, clinical, and psychiatric tasks, with Uni-NTFMlarge_{large} achieving strong performance without modification (Chen et al., 29 Sep 2025, Li et al., 30 Aug 2025).

Label efficiency is a defining attribute. Models such as BrainGFM and SimCLR-3D show minimal drops under few-shot regimes (1%–10% of labeled samples), and can perform competitively using prompt-based linear probes alone (Wei et al., 31 May 2025, Kaczmarek et al., 12 Sep 2025).

6. Inductive Biases, Interpretability, and Clinical Adaptability

The most effective brain-trained FMs explicitly encode neuroscientific priors:

Interpretability is being advanced by explicit mapping of self-attention or prompt tokens to clinical biomarkers or network hubs, as well as by development of coding-rate penalties and diversity-promoting losses that prevent representational collapse (Wei et al., 31 May 2025, Gijsen et al., 12 Dec 2025).

Clinical adaptability is enhanced by modular architectures and cross-modal interface components (dynamic adapters, hypergraph fusion, conditional normalization), supporting robust performance across missing modalities and unseen protocols (Luu et al., 4 Nov 2025, Deng et al., 1 May 2025).

7. Open Challenges and Future Directions

Ongoing challenges for brain-trained foundation models include:

Foundational research is also exploring direct integration of human neuroimaging signals as supervisory or reward signals during foundation model training, representing an emerging path towards integrating brain-level cognition into future general-purpose AI development (Donoso, 17 Jan 2026).


References:

(Zhou et al., 1 Mar 2025) Brain Foundation Models: A Survey on Advancements in Neural Signal Processing and Brain Discovery (Wei et al., 31 May 2025) A Brain Graph Foundation Model: Pre-Training and Prompt-Tuning for Any Atlas and Disorder (Wang et al., 26 Dec 2025) SLIM-Brain: A Data- and Training-Efficient Foundation Model for fMRI Data Analysis (Kaczmarek et al., 12 Sep 2025) Building a General SimCLR Self-Supervised Foundation Model Across Neurological Diseases to Advance 3D Brain MRI Diagnoses (Gordaliza et al., 19 Jan 2026) From 100,000+ images to winning the first brain MRI foundation model challenges: Sharing lessons and models (Barmpas et al., 22 May 2025) Advancing Brainwave Modeling with a Codebook-Based Foundation Model (Chen et al., 29 Sep 2025) Uni-NTFM: A Unified Foundation Model for EEG Signal Representation Learning (Zhou et al., 29 Jun 2025) CSBrain: A Cross-scale Spatiotemporal Brain Foundation Model for EEG Decoding (Gijsen et al., 12 Dec 2025) Brain-Semantoks: Learning Semantic Tokens of Brain Dynamics with a Self-Distilled Foundation Model (Ghamizi et al., 16 Jun 2025) Brain Imaging Foundation Models, Are We There Yet? A Systematic Review of Foundation Models for Brain Imaging and Biomedical Research (Luu et al., 4 Nov 2025) A Foundation Model for Brain MRI with Dynamic Modality Integration (Shen et al., 7 Nov 2025) BrainCSD: A Hierarchical Consistency-Driven MoE Foundation Model for Unified Connectome Synthesis and Multitask Brain Trait Prediction (Deng et al., 1 May 2025) Brain Foundation Models with Hypergraph Dynamic Adapter for Brain Disease Analysis (Dong et al., 29 Sep 2025) Brain Harmony: A Multimodal Foundation Model Unifying Morphology and Function into 1D Tokens (Donoso, 17 Jan 2026) A New Strategy for Artificial Intelligence: Training Foundation Models Directly on Human Brain Data

Definition Search Book Streamline Icon: https://streamlinehq.com
References (16)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Brain-Trained Foundation Models.