- The paper demonstrates that transformer-based architectures dominate while state-space models offer efficient linear time complexity for long-sequence EEG data.
- The paper evaluates diverse SSL objectives, including masked autoencoding and composite losses, revealing superior cross-dataset generalization and label efficiency.
- The paper highlights challenges such as channel heterogeneity and corpus bias, calling for diversified datasets and unified benchmarks to enhance model robustness.
Systematic Review of Self-Supervised Foundation Models for EEG Brain Network Representation
Context and Motivation
The paradigm of automated electroencephalography (EEG) analysis has undergone substantial transformation with the proliferation of self-supervised learning (SSL) and the adoption of advanced model architectures, notably transformers and state-space models (SSMs). This systematic review targets the subset of foundation models that learn representations from whole-brain multichannel EEG recordings using SSL, summarizing developments in architectural choices, pretraining corpus diversity, SSL objectives, spatial encoding mechanisms, and downstream clinical or cognitive tasks. The review intentionally excludes channel-limited models, focusing on approaches with potential for robust cross-task generalization.
Pretraining Data and Corpus Diversity
A major constraint in the current landscape is the reliance on the Temple University Hospital EEG (TUEG) corpus, utilized by over half the reviewed models. While the TUEG corpus supplies extensive clinical data, it introduces bias especially in terms of recording context and task domain, as typical sessions focus on resting state or pathological states. Few models incorporate datasets from OpenNeuro or similar repositories, and especially task-based datasets remain underrepresented. The scale of data used for pretraining in recent studies exceeds 10,000 hours in several cases. Sampling rate normalization (typically 125–256 Hz), systematic bandpass filtering (commonly 0.1–100 Hz), and channel variability (8–128 channels) are prevalent, yet strategies for harmonization across diverse electrode layouts are non-standardized and have not been rigorously compared.
Architectural Trends
Transformer architectures dominate, with 17/19 models employing them as backbones. Transformer configurations exhibit heterogeneity:
Attention mechanisms span temporal, spatial, and joint spatiotemporal dimensions. Spatial encoding is addressed using learnable channel embeddings, fixed embeddings, positional encoding with head models, and learnable 2D convolutions. Roughly 75% of models augment their architecture with explicit spatial encoding, reflecting the necessity of modeling electrode topology.
State-space models—MAMBA and S4—emerge as computationally efficient alternatives, delivering linear time complexity (O(L)) compared to the quadratic complexity of transformers (O(L²)). MAMBA introduces dynamic selection mechanisms for parameter adaptation, whereas S4 leverages FFT-based convolutions for efficient long-context modeling. Direct empirical comparisons between SSMs and transformers in the EEG domain are presently lacking.
Model size varies considerably, with parameter counts ranging from 3M to 540M. No consensus exists regarding scaling principles or the optimal parameter regime for whole-brain EEG modeling, and the applicability of scaling laws observed in other modalities remains an open question.
Self-Supervised Pretraining Objectives
SSL objectives employed include:
- Masked autoencoding: Continuous signal reconstruction/regression is the most common (12 models).
- Autoregressive loss: Used infrequently, despite success in other domains.
- Contrastive learning: Implemented as in BENDR, and several others.
- Composite/combined objectives: Losses are summed across tasks (masked reconstruction, contrastive, autoregression, frequency band estimation).
Discrete token prediction via VQ or classification is adopted in multi-stage pretraining setups. The field has shifted emphatically towards masked autoencoding and composite SSL objectives, with contrastive paradigms experiencing a relative decline.
Fine-Tuning Protocols and Downstream Application
Downstream adaptation predominantly follows single-task fine-tuning, updating all model parameters, although some studies utilize linear probing, adapter-based parameter-efficient strategies, or incremental module updating. Full-parameter fine-tuning is empirically favored for several benchmarks. Multi-task fine-tuning has been rarely explored, with ALFEE constituting an outlier.
Application domains encompass clinical seizure detection, sleep staging, abnormal event detection, and BCI-oriented tasks (motor imagery, emotion recognition, artifact detection). Diversity across benchmarks, preprocessing protocols, input feature domain, and classifier head design impedes any direct inter-model comparison.
Numerical Outcomes and Claims
- Several foundation models demonstrate superior cross-dataset generalization and label efficiency relative to classical supervised approaches.
- No universally robust, zero-shot EEG foundation model currently exists.
- Models with spatial encoding mechanisms and full-parameter fine-tuning frequently yield the best downstream performance.
- SSMs offer clear computational advantages for long-sequence tasks but lack comprehensive validation in neurophysiological contexts against transformers.
Theoretical and Practical Implications
Current progress confirms the capacity of SSL-based foundation models to extract transferable representations from large-scale, heterogeneous EEG corpora. The unresolved challenges are numerous: harmonization of channel configuration, effective spatial encoding across standards, mitigation of corpus-specific bias, and development of general-purpose, multi-task EEG models. The lack of standard evaluation protocols and unified benchmarks severely restricts reproducibility and comparative advances.
Prospects and Future Directions
The trajectory of EEG foundation model research will be determined by:
- Curating larger, more diverse, and task-rich pretraining corpora, potentially leveraging community datasets (OpenNeuro, BIDS-compliant).
- Benchmark standardization, such as proposed by EEG-Bench (2512.08959), to enable rigorous scaling law exploration and cross-model comparison.
- Architectural innovation to enhance spatial and temporal dependency modeling and scalability, with emphasis on empirical SSM vs. transformer comparisons.
- Investigation into parameter-efficient and multi-task fine-tuning techniques to approach truly generalizable, zero-shot EEG models suitable for clinical and BCI deployment.
Conclusion
This systematic review delineates the rapid advancements and persistent limitations in self-supervised EEG foundation modeling. While SSL and transformer-based approaches have greatly expanded representational quality and cross-task potential for whole-brain EEG, practical deployment is still constrained by data diversity, channel heterogeneity, and the absence of standard benchmarks. Resolution of these challenges is critical for the transition from specialized models to robust, general-purpose EEG analysis tools with direct translational utility in neuroscience and clinical domains.
Reference:
"Systematic review of self-supervised foundation models for brain network representation using electroencephalography" (2602.03269)