LaBraM EEG Foundation Model

Updated 22 February 2026

LaBraM is a large-scale neural foundation model for EEG that leverages patch-based transformers and vector-quantized spectral tokenization to learn generic, transferable representations.
It exhibits robust cross-dataset generalization, enabling rapid fine-tuning for diverse brain-computer interface applications such as stress detection, emotion recognition, and artifact removal.
The model’s design incorporates advanced signal processing techniques and test-time adaptation strategies, achieving state-of-the-art performance on multiple EEG benchmarks.

LaBraM is a large-scale neural foundation model for electroencephalography (EEG), designed to learn generic, transferable representations from thousands of hours of heterogeneous brainwave recordings. Drawing inspiration from the success of self-supervised pretraining in LLMs, LaBraM employs a patch-based transformer architecture combined with a vector-quantized spectral tokenizer, enabling robust cross-dataset generalization and downstream adaptability for a wide range of brain-computer interface (BCI) tasks, including stress detection, emotion recognition, motion artifact removal, and empathy assessment. Optimized for both large-scale heterogeneous pretraining and rapid fine-tuning, LaBraM and its follow-on variants (such as LaBraM++ and domain-adapted versions) constitute the leading edge of foundation model development in EEG representation learning (Jiang et al., 2024, 2505.23042, Barmpas et al., 22 May 2025).

1. Architectural Overview and Self-Supervised Pretraining

LaBraM employs a modular encoder-decoder architecture, comprising three main stages: patchification and embedding, a deep transformer stack, and a neural tokenizer based on spectral vector quantization.

Patch Representation: Raw EEG signals $X\in\mathbb{R}^{C\times T}$ are segmented by channels into non-overlapping temporal windows of length $w$ , yielding $N = C \cdot \lfloor T/w \rfloor$ patches. Each patch is processed by multiple 1D convolutional layers (Conv $\to$ GroupNorm $\to$ GELU activation) to produce a $d$ -dimensional embedding $e_{c_i,k}\in\mathbb{R}^d$ (Jiang et al., 2024, Barmpas et al., 22 May 2025).
Positional Encoding: Learnable spatial ( $se_i$ ) and temporal ( $te_k$ ) embeddings are summed with each patch embedding, encoding channel and temporal identity: $e_{c_i,k} + se_i + te_k$ , enabling flexible cross-dataset transfer and variable montages.
Transformer Encoder: The sequence of patch embeddings is processed by stacked multi-head self-attention transformer blocks. The canonical LaBraM-Base has 12 layers, hidden dimension $w$ 0, MLP size 800, and 10 attention heads. All attention blocks utilize pre-attention LayerNorm and are bias-free in QKV projections (Jiang et al., 2024, Barmpas et al., 22 May 2025).
Neural Tokenizer: Each patch embedding is quantized against a learnable codebook $w$ 1 using nearest-neighbor search in cosine-normalized space:

$w$ 2

The quantized tokens are used in a VQ-VAE-style setup to reconstruct the discrete Fourier amplitude and phase spectrum per patch (Jiang et al., 2024).

Pretraining Objective: Pretraining consists of two stages:
1. Tokenizer Training: MSE losses on amplitude and phase reconstruction, plus commitment and codebook update losses.
2. Masked Token Modeling: A fraction $w$ 3 of patch tokens is randomly masked and replaced with a learnable embedding. The transformer predicts original tokens with a softmax classifier. The objective is:
$w$ 4

with symmetric masking to maximize sequence diversity. Training uses AdamW with cosine-decay schedules on datasets totaling over 2,500 hours and up to 64 channels at sampling rates up to 1 kHz (Jiang et al., 2024, Barmpas et al., 22 May 2025).

2. Model Variants and Signal-Processing Improvements

Enhancements introduced in LaBraM++ and related variants address key challenges in EEG signal normalization, reference, and architectural flexibility:

Common Average Reference (CAR): Subtracting the per-patch mean across channels to suppress global noise (Barmpas et al., 22 May 2025).
Z-Scoring: Per-patch, per-channel standardization to zero mean and unit variance.
Flexible Positional Encoding: Revised spatial embeddings to handle variable and partial channel sets.
Phase Loss Redefinition: Sine/cosine loss for phase to ensure smooth optimization on the unit circle:

$w$ 5

Patch and Embedding Design: Adaptive patch length (e.g., $w$ 6 for 1 s windows at 200 Hz), supporting up to 256 tokens per segment (Barmpas et al., 22 May 2025).

These refinements systematically improve subject-independent performance, convergence stability, and interoperability across diverse EEG hardware (Barmpas et al., 22 May 2025).

3. Downstream Adaptation and Robustness

LaBraM’s versatility is demonstrated in its downstream fine-tuning protocol:

Transfer and Fine-Tuning: The pretrained transformer’s output is average pooled or combined via a [CLS]-token, then passed to a lightweight MLP classification/regression head. All model parameters can be fine-tuned, or partial layers adapted for greater generalization (Jiang et al., 2024, 2505.23042).
Data-Centric Pipeline: Preprocessing typically includes 1–50 Hz or 0.5–44.5 Hz band-pass, artifact subspace reconstruction, ICA, and channel rejection, followed by segmentation into fixed-length (e.g., 1–5 s) windows (2505.23042).
Performance Metrics: Balanced accuracy, AUC-PR, and weighted F1 are used for multi-class/classification tasks, with robust performance documented across stress recognition (up to 90.47% BalAcc on 5 s windows), emotion decoding, and abnormality/event detection (2505.23042, Jiang et al., 2024).
Robustness to Channel Count and Temporal Resolution: Ablations show graceful accuracy degradation from 81.04% BalAcc (31 channels) to ≈72% (11–20 channels), outperforming task-specific comparators even at reduced spatial resolution (2505.23042).
Random Seed/Permutation Robustness: Test splits with different seeds yield stable accuracy, illustrating limited sensitivity to minor dataset partitioning—a consequence of strong pretraining and data-centric fine-tuning (2505.23042).

4. Application Domains and Benchmarking

LaBraM’s design allows for broad BCI applicability and competitive, often state-of-the-art, results:

Stress Detection in Real-World Settings: Achieves up to 90.47% balanced accuracy on resting-state classroom EEG (5 s windows, 31 channels), exceeding the best classical or domain-specific models (2505.23042).
Abnormal/Pathology Detection and Event Type Classification: Outperforms prior SOTA on TUAB and TUEV (e.g., 0.8140 BalAcc vs. BIOT's 0.796) (Jiang et al., 2024).
Emotion Recognition and Gait Regression: Consistent accuracy gains versus previous transformer-based pipelines, with demonstrated utility across classification and regression endpoints (Jiang et al., 2024).
Multimodal Integration and Artifact Suppression: When extended for cross-modal tasks (e.g., IMU-EEG), attention-based grafting to the LaBraM latent space and artifact-gated reconstructions yield state-of-the-art motion artifact removal while maintaining interpretability of attention maps (Zhang et al., 1 Sep 2025).
Psychometric and Socio-emotional Prediction: Embedded in fusion/contrastive architectures (e.g., BEAM), LaBraM-encoded EEG features enable objective assessment of children's empathy and outperform competitive encoders by 8–13% absolute accuracy in cross-subject tasks (Xie et al., 8 Sep 2025).

Selected Benchmark Performance Table

Task	LaBraM-Base	Comparator (Best SOTA)	Reference
TUAB Abnormal Detection	0.8140 ± 0.0019	BIOT 0.7959 ± 0.0057	(Jiang et al., 2024)
TUEV Event Classification	0.6409 ± 0.0065	BIOT 0.5281 ± 0.0225	(Jiang et al., 2024)
Stress Detection (31 ch)	0.9047 (best seed)	N/A (task-specific SOTA <0.79)	(2505.23042)
Empathy Assessment (BEAM)	64.7% ± 0.8%	BIOT 56.4%; ST-Tx <52%	(Xie et al., 8 Sep 2025)

5. Advanced Domain Adaptation and Test-Time Training

Recent research has addressed the inherent mismatch between generic pretraining objectives and specific downstream EEG tasks, as well as the challenge of cross-subject session generalization.

Self-Supervised Domain Fine-Tuning: Augmented supervision leveraging task-relevant pretext tasks such as stopped-band prediction (spectral), anterior-posterior flip detection (spatial), and temporal jigsaw classification (temporal) have been used to regularize and align LaBraM’s internal features to downstream distributions (Wang et al., 30 Sep 2025).
Test-Time Adaptation (TTT): Two key approaches are used:
- Self-Supervised Sample-Level Adaptation: Per-test-sample gradient steps on pretext SSL objectives with lightweight heads.
- BatchNorm Entropy Minimization (Tent): Online calibration via entropy loss to adapt only normalization statistics without modifying general network weights.
Empirical Gains: Across imagined speech, mental stress, and motor imagery, additive pipelines (NeuroTTT) leveraging these methods with LaBraM backbones consistently improve accuracy, Cohen’s $w$ 7, and F1 by 2–11 pp compared to linear or vanilla fine-tuning (Wang et al., 30 Sep 2025).

6. Limitations, Interpretability, and Future Directions

Key limitations and areas of ongoing work include:

Model Size and Computational Overhead: At 5.8 to 369 million parameters, LaBraM is large for EEG but orders-of-magnitude smaller than LLMs; edge deployment and wearable EEG applications remain constrained by GPU/TPU requirements (2505.23042, Jiang et al., 2024).
Interpretability: While attention map-based motion artifact suppression maps (e.g., $w$ 8 over EEG and IMU) afford some channel-wise insight, most transformer-derived representations remain black-box; improved attribution and neuroscientific interpretability techniques are needed (Zhang et al., 1 Sep 2025, 2505.23042).
Scalability and Efficiency: There is ongoing exploration of partial fine-tuning, parameter-efficient routers, adapters, and distillation techniques to support on-device use and minimize memory footprint without sacrificing accuracy (Jiang et al., 2024, 2505.23042).
Multimodal and Population-Specific Extensions: Further training on pediatric EEG corpora, adaptable patch/window strategies, and multi-modal integration (fNIRS, EMG, eye-tracking) are promising directions for both foundational learning and applied BCI development (Xie et al., 8 Sep 2025).
Ablation and Pretraining Dependency: All studies consistently show that the absence of large-scale pretraining or of key tokenizer/embedding designs leads to precipitous accuracy drops, underscoring the necessity of high-quality foundation model initialization (Jiang et al., 2024, 2505.23042).

7. Summary and Significance

LaBraM establishes a scalable, data-centric paradigm for universal EEG representation learning, leveraging masked transformer modeling and semantic vector-quantized tokenization to enable cross-task and cross-population transfer in BCIs and neuroscience. It offers a robust backbone for both unimodal and multimodal signal interpretation while setting a new methodological baseline for EEG foundation models and their deployment in practical, real-world and multi-subject scenarios (Jiang et al., 2024, 2505.23042, Barmpas et al., 22 May 2025, Wang et al., 30 Sep 2025, Xie et al., 8 Sep 2025, Zhang et al., 1 Sep 2025).