SEED-IV Dataset for EEG Emotion Recognition
- SEED-IV dataset is a publicly available corpus designed for EEG-based emotion recognition, featuring four discrete emotions elicited by naturalistic film clips.
- The dataset employs a standardized preprocessing pipeline with band-pass filtering, ICA, and differential entropy feature extraction to ensure data quality and robust model calibration.
- Its integration of synchronous EEG and eye-tracking data supports advanced multimodal analysis and has enabled benchmark studies in cross-dataset deep learning applications.
The SEED-IV dataset is a publicly available corpus for EEG-based emotion recognition, situated within the SEED family of datasets. SEED-IV is characterized by four discrete emotion classes elicited by naturalistic film stimuli and captured via high-density 62-channel EEG. Its protocol and data structure are engineered for subject-independent modeling and cross-dataset generalization studies, as exemplified by its pivotal role in recent contrastive learning architectures for affective brain-computer interfaces (Liao et al., 2024).
1. Participant Pool and Recording Protocol
SEED-IV consists of recordings from healthy university students (7 male, 8 female), each participating in three separate sessions. Informed consent protocols were followed in accordance with ethical standards. No further demographic stratification is specified beyond the original release [27]. Each session includes 24 trials, and subjects complete a total of 72 trials (), where each trial corresponds to viewing an emotionally targeted film clip.
Recording was performed with a 62-channel ESI NeuroScan system using the international 10–20 montage. The initial reference electrode was Cz. Sampling rate is Hz, yielding 200 samples per channel per second and matching the hardware configuration of SEED and SEED-V for cap type and amplifier bandwidth (Liao et al., 2024).
2. Stimulus Design and Emotion Categories
Emotion induction in SEED-IV is achieved through the presentation of 72 validated, Chinese-language film-scene clips. These clips are selected to maximize ecological validity and evoke specific emotional responses, forming four target categories:
- Joy
- Sorrow
- Neutrality
- Anxiety
During each session, 24 clips are presented in randomized order, structurally ensuring balanced exposure and statistical power for each emotional class. Only the last 30 s of each trial are retained for analysis to ensure emotional states stabilize post-stimulus onset.
3. Data Structure and Feature Extraction
Raw SEED-IV data is organized as a four-dimensional tensor
where is subjects, is trials, is channels, and is time points (). Each element corresponds to a single EEG sample.
For downstream machine learning, the CLDTA architecture initially transforms this raw tensor via band-by-band differential entropy (DE) feature extraction. The DE for a signal segment is given by:
where is the density and is variance in the specified frequency band. DE features are computed for five canonical bands: (0.1–4 Hz), (4–8 Hz), (8–13 Hz), (13–31 Hz), (31–50 Hz).
4. Preprocessing Workflow
SEED-IV implements a standardized EEG preprocessing pipeline with EEGLAB:
a. Band-pass filter (0.01–48 Hz) and 50 Hz notch filter b. Channel rejection for prolonged flat signals (5 s), excessive variance (4× overall std), or low correlation (0.6) c. Trial (epoch) rejection if windowed variance exceeds channel variance d. Spherical spatial interpolation for rejected channels e. ICA decomposition (up to 5 artifact components manually discarded) f. Re-referencing to average g. Truncation to last 30 s (6000 samples/trial, all channels) h. DE feature extraction and temporal smoothing (linear dynamic system model, cf. [12])
All subjects are processed identically, ensuring consistency required for domain adaptation studies.
5. Data Composition and Unique Attributes
SEED-IV situates itself as an intermediate benchmark between SEED (3 emotions) and SEED-V (5 emotions). With four discrete emotional states and a total of 72 trials/subject, it affords robust within-subject and cross-subject statistical modeling while maintaining manageable data scale.
A distinguishing feature is synchronous EEG and eye-tracking acquisition; SEED-IV is the first in the series to be released with both modalities aligned per trial. While CLDTA uses only EEG channels, inclusion of gaze data supports future multimodal emotion modeling. SEED-IV thus offers richer class labels (joy, sorrow, neutral, anxiety) and a higher trial count per subject compared to predecessors.
6. Benchmarking and Application in Cross-Dataset Research
SEED-IV is regularly employed in training and validation of deep learning models designed for cross-dataset domain adaptation, such as CLDTA ("Contrastive Learning based on Diagonal Transformer Autoencoder") (Liao et al., 2024). Its channel-dense, film-stimulus protocol enables robust transfer learning across domain shifts in BCI research.
Researchers leverage SEED-IV for:
- Subject-independent emotion decoding
- Model calibration on minimal samples for new subjects
- Visualization and interpretability of brain network representations (via information separation mechanisms)
- Band-wise feature ablation and frequency-specific emotional signature studies
Table: SEED-IV Summary
| Property | Value | Notes |
|---|---|---|
| Subjects () | 15 (7 M / 8 F) | University students |
| EEG Channels () | 62 | 10–20 montage, ESI NeuroScan |
| Sampling Rate () | 200 Hz | Standard for SEED family |
| Trials/Session | 24 | 3 sessions per subject |
| Total Trials | 72 | Last 30 s of each retained |
| Emotions | Joy, Sorrow, Neutral, Anxiety | Validated film stimuli |
| Data Tensor Shape | Full raw recording | |
| Eye Tracking | Yes | Synchronous with EEG |
7. Context and Implications
SEED-IV’s design allows for testing deep-learning approaches targeting universality across acquisition devices, population samples, and stimulus formats. A plausible implication is its key role in the shift from controlled laboratory settings toward more generalizable emotion recognition pipelines.
Its moderate subject pool, granular emotional states, and dense EEG coverage make SEED-IV an essential dataset in modern affective BCI benchmarking, particularly for projects leveraging contrastive, transformer-based, or calibration-driven methods.
SEED-IV anchors the development of transferable models bridging laboratory and real-world emotional state decoding, and provides a robust, well-preprocessed platform for comparative studies across the SEED family and other canonical EEG emotion corpora (Liao et al., 2024).