Three-Way Emotion Classification

Updated 7 February 2026

Three-way emotion classification is a supervised learning task categorizing affective data into positive, neutral, or negative classes, essential for robust emotion analysis.
It employs modality-specific preprocessing and advanced ensemble methods, CNNs, and tailored feature engineering to handle EEG, audio, and text data.
Comparative studies show that hybrid model architectures and domain-adapted features improve accuracy and model robustness across varied datasets.

Three-way emotion classification refers to the supervised learning task in which affective data—such as neurophysiological signals, audio recordings, or text—is categorized into one of three discrete classes, typically labeled as positive, neutral, or negative. This trichotomy recognizes that many emotion-related applications require not only the identification of clear affective valence (positive/negative) but also the robust detection of neutral or affectless states. Three-way frameworks are prominent in research spanning EEG-based affective computing, audio-visual sentiment analysis, and natural language processing. This article provides a comprehensive review of three-way emotion classification, focusing on signal modalities, feature engineering, model architectures, evaluation protocols, comparative performance, and outstanding challenges.

1. Problem Formulation and Labeling Protocols

Three-way emotion classification constrains the emotional label space to three categories, generally defined as:

Negative: Data corresponding to clearly adverse, unpleasant, or distressing affective states.
Neutral: Affective states judged to lack salient positive or negative valence; often associated with resting or baseline conditions, or text/audio reflecting objective content.
Positive: Stimuli or data eliciting pleasant, rewarding, or otherwise positive affect.

Precise operationalization varies by modality and experimental protocol:

In EEG-based paradigms, emotional state labels are induced by presentation of emotionally valenced film clips or standardized stimuli. Windowed EEG segments are associated with the class of the stimulus presented during their recording (Purwar et al., 31 Jan 2026).
In audio classification of movie scenes, class labels ("Good," "Neutral," "Bad") are assigned to segmented audio clips via independent rater consensus or mapping from source dataset taxonomies (Xiong et al., 22 Nov 2025).
For text, such as customer feedback, Spanish utterances are directly labeled as positive, neutral, or negative based on content analysis, often with majority-vote adjudication and cross-language validation (Thapa et al., 26 May 2025).

This three-class labeling supports both balanced and imbalanced distribution handling, with class priors either preserved or enforced via sampling to ensure robust model comparison.

2. Data Preprocessing and Feature Engineering

The pipeline for emotion classification critically depends on tailored preprocessing and high-quality feature extraction specific to the data modality.

EEG Signals

Segmentation: Continuous EEG is segmented into short windows (1 s, with or without overlap).
Filtering: Bandpass filtering (e.g., 0.5–45 Hz) suppresses both low-frequency drifts and high-frequency noise (Purwar et al., 31 Jan 2026).
Artifact Rejection: Visual/manual criteria or thresholding remove windows contaminated by eye blinks, muscle activity, or motion.
Normalization: Feature-wise z-score normalization for algorithms sensitive to scale, such as SVM (Purwar et al., 31 Jan 2026).

Extracted features include:

Time-domain: Mean, standard deviation, variance, skewness, kurtosis, Hjorth parameters.
Frequency-domain: Band power (Delta, Theta, Alpha, Beta, Gamma), spectral centroid, spectral entropy, and relative band-power.
Information theoretic: Differential entropy (DE), defined as $DE = \frac{1}{2}\ln(2\pi e \sigma^2)$ per channel/band.

Derived features such as pairwise DE differences are compiled into "AsMap" tensors of all-channel asymmetries, which are then processed by CNNs for automatic spatial feature discovery (Ahmed et al., 2022).

Audio Data

Segmentation: Non-overlapping windows (e.g., 7 s) form the basis for clip-level classification (Xiong et al., 22 Nov 2025).
Feature Sets: Extraction spans MFCCs (24-band Mel scale), log-Mel spectrogram energies, zero-crossing rate, spectral centroid, spectral roll-off, and chroma vectors.
Feature Engineering: Each feature is condensed into mean, range, and mean-absolute deviation within the window.
Filtering: Outlier rejection via IQR/median replacement, min–max normalization, variance thresholding, statistical ( $\chi^2$ and Spearman correlation) and train/test drift analysis for dimensionality reduction.

Text Data

Cleaning and Tokenization: Standard routines to remove duplicates, normalize case and accents, strip punctuation, and tokenize for vectorization (Thapa et al., 26 May 2025).
Vectorization: TF-IDF representations complement deep BERT embeddings (“bert-base-multilingual-uncased”), with document features derived from mean or [CLS] embeddings (768-dim).
Feature Concatenation: Combined TF-IDF and BERT vectors form the base input for downstream models, maximizing lexical and contextual semantic coverage.

3. Model Architectures and Training Strategies

A variety of supervised learning frameworks are deployed for three-way classification depending on the feature and data landscape.

Classical ML Approaches

Logistic Regression (LR): Multi-class, one-vs-rest formulation with $L_2$ -regularization. Training optimizes $L(\beta) = -\sum_{i=1}^N \sum_{k=1}^3 1\{y_i=k\} \cdot \log \sigma_k(\beta^T x_i) + \lambda \|\beta\|_2^2$ (Purwar et al., 31 Jan 2026).
Support Vector Machine (SVM): RBF kernels with hyperparameters ( $C$ , $\gamma$ ) tuned via cross-validation. Output is determined by $f(x) = \mathrm{sign}\left(\sum_{i \in SV} \alpha_i y_i K(x_i, x) + b\right)$ (Purwar et al., 31 Jan 2026).
Random Forest (RF): Ensembles of 100 decision trees, Gini impurity splits, bootstrap samples, and node-wise feature subsetting ( $\sqrt{\text{n\_features}}$ ). Majority vote produces predictions (Purwar et al., 31 Jan 2026).

Ensemble and Stacking Approaches

Custom Stacking Ensembles: In text-based paradigms, base learners include LR, KNN, bagged LGBM, AdaBoost, all trained on concatenated TF-IDF and BERT features, with a one-vs-all LR meta-classifier processing base model outputs $Z_i = [h_1(x_i), ..., h_4(x_i)]$ (Thapa et al., 26 May 2025).
Stacked Audio Models: Base layer of 10 RBF SVMs and 6 neural networks (3 fully-connected NNs, 3 1D-CNNs), meta-learner SVM, and LOOCV for the stacking input space (Xiong et al., 22 Nov 2025).

Deep Feature Extraction

AsMap+CNN for EEG: AsMap tensors (all-channel asymmetric DE differences) serve as spatial “images” for CNNs (2 Conv2D layers with ReLU and max-pooling, followed by two 512-unit dense and softmax layers) (Ahmed et al., 2022).

Hyperparameters (e.g., $C$ , $\chi^2$ 0, n_neighbors, n_estimators) are tuned via cross-validation or grid-search; deep learning experiments typically employ early stopping and dropout to regularize small datasets.

4. Performance Metrics and Comparative Results

Evaluation protocols rely on both macro-averaged and class-specific metrics:

Precision: $\chi^2$ 1
Recall: $\chi^2$ 2
F1-score: $\chi^2$ 3
Accuracy: Proportion of all correct predictions, macro-averaged or per-class.

EEG-based Classification

Random Forest achieves state-of-the-art test accuracy (97.5%) and macro-F1 (0.975) on consumer EEG, outperforming SVM (93.9%) and LR (95.6%) (Purwar et al., 31 Jan 2026).
AsMap+CNN achieves 97.1% accuracy on the SEED dataset for three-class classification using the $\chi^2$ 4 band, surpassing raw DE and other asymmetry-derived features (Ahmed et al., 2022).

Audio-based Classification

Stacking ensemble achieves 86% accuracy on real-world movie audio clips (balanced across Good/Neutral/Bad), with per-class F1-scores of 0.84–0.88 (Xiong et al., 22 Nov 2025).
Less effective on small or simulated datasets (67% accuracy), indicating sensitivity to domain mismatch and dataset scale.

Text-based (Spanish)

Custom stacking ensemble with TF-IDF+BERT features achieves 93.3% accuracy and balanced per-class F1-scores (0.92–0.94) on native Spanish customer feedback (Thapa et al., 26 May 2025).
Translation from Spanish to English results in a ~5% accuracy drop.
Ensemble approach surpasses both individual ML models and fine-tuned transformers (baseline BERT accuracy: 90%).

Table: Summary of Three-Way Emotion Classification Performance (Selected Modalities & Methods)

Modality	Model / Features	Accuracy (%)	Macro-F1	Reference
EEG (SEED)	AsMap+CNN (γ-band)	97.1	-	(Ahmed et al., 2022)
EEG (Muse)	Random Forest (~80 FE)	97.5	0.975	(Purwar et al., 31 Jan 2026)
Audio (movie)	SVM+NN stack	86	0.86	(Xiong et al., 22 Nov 2025)
Text (ES)	CSE (TF-IDF+BERT)	93.3	0.93	(Thapa et al., 26 May 2025)

Within each domain, advanced feature engineering and model ensembles consistently outperform classical baselines and simple deep learning under small-to-moderate data regimes.

5. Comparative Analysis and Methodological Insights

Three-way emotion classification exhibits modality-dependent feature–model synergy:

EEG: Nonlinear models (RF, AsMap+CNN) outperform classical linear or regionally reduced pipelines by capturing interactions among statistical and spectral features as well as spatial asymmetry structures. The inclusion of all pairwise DE differences permits CNNs to discover high-level asymmetry patterns absent in manual asymmetry metrics (Ahmed et al., 2022).
Audio: Stacking ensemble learning leveraging complementary SVMs (effective for “Good” class) and neural nets (robust on “Bad”) yields significant performance wins over single-model approaches, while low computational overhead enables embedded deployment (Xiong et al., 22 Nov 2025).
Text: Coupling TF-IDF for lexical polarity with BERT’s deep context representations allows stacking ensembles to navigate high-dimensional and idiomatic linguistic distinctions, particularly in Spanish, mitigating translation-related performance loss (Thapa et al., 26 May 2025).

A common thread is the efficacy of combining domain-adapted feature engineering with model stacking or hybrid architectures, rather than relying on vanilla end-to-end deep learning, especially in constrained-data scenarios.

6. Limitations, Challenges, and Future Directions

Principal limitations include:

Sample Size and Generalizability: Many studies deploy small subject pools (EEG: N=2 for Muse; N=15 for SEED) or limited real-world text datasets, constraining the external validity of reported results (Purwar et al., 31 Jan 2026).
Modality-specific Artifacts: EEG results may be affected by subjective artifact rejection. Consumer-grade devices with few channels (e.g., Muse, 4 channels) inherently limit spatial resolution versus full-headcaps.
Model Complexity vs. Data Scale: Deep models (CNNs, NNs) risk overfitting on small datasets unless combined with strong regularization or used in conjunction with interpretable (manual) feature constructs.
Window and Feature Sensitivity: EEG classification accuracy deteriorates with increased window size; higher frequency bands deliver superior asymmetry-based separation, but findings may not generalize beyond specific features or band selections (Ahmed et al., 2022).
Translation and Semantic Integrity: In NLP, translational pipelines degrade emotion classification (accuracy drop from 0.93 to 0.88), highlighting the importance of preserving linguistic nuance for high-fidelity affective modeling (Thapa et al., 26 May 2025).

Future research directions prioritize:

Expanding datasets to encompass more participants, languages, and naturalistic scenarios (Purwar et al., 31 Jan 2026).
Incorporating advanced deep-learning architectures for both spatial (CNN) and temporal (LSTM) modeling, and hybridizing statistical ML with representation learning (Purwar et al., 31 Jan 2026).
Automating artifact handling (e.g., ICA for EEG), and adopting adaptive, frequency-dependent windowing methods for multi-scale feature extraction (Ahmed et al., 2022).
Pursuing subject-independent and cross-dataset validation to improve robustness and applicability.

7. Application Domains and Deployment Considerations

Three-way emotion classification underpins a range of affective computing and human–computer interaction tasks:

EEG-based systems facilitate real-time affect monitoring for BCI, stress detection, and adaptive environments, with efficient feature sets and forest-based classifiers enabling low-latency inference suitable for consumer-grade hardware (Purwar et al., 31 Jan 2026).
Audio-based models support movie recommendation, content curation, and ambient intelligence, with stacking architectures running in real time (<200 ms per 7 s clip) and deployable on resource-constrained devices due to modest memory and computational requirements (Xiong et al., 22 Nov 2025).
NLP approaches inform customer experience management, sentiment-driven response, and multilingual feedback analytics, with hybrid vectorization and stacking ensembles providing both accuracy and efficiency in small- to mid-sized enterprise settings (Thapa et al., 26 May 2025).

A plausible implication is that the high performance of hybrid and ensemble pipelines in three-way emotion classification suggests ongoing value in sophisticated preprocessing and feature construction, even in domains with rapidly advancing deep learning methodologies.

Markdown Report Issue Upgrade to Chat

References (4)

Three-Way Emotion Classification of EEG-based Signals using Machine Learning (2026)

Three-Class Emotion Classification for Audiovisual Scenes Based on Ensemble Learning Scheme (2025)

Emotion Classification In-Context in Spanish (2025)

Automated Feature Extraction on AsMap for Emotion Classification using EEG (2022)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Three-Way Emotion Classification.