- The paper introduces a framework that fuses synchronized E-nose and EEG signals to overcome inter-individual variability in odor preference recognition.
- It employs advanced architectures, including crossmodal and self-attention in BMFNet, achieving 92.79% accuracy and robust F1-scores.
- Knowledge distillation reduces computational cost by 40%, making the model both effective and efficient for practical applications.
Human-Machine Cooperative Multimodal Learning for Cross-Subject Olfactory Preference Recognition
Introduction
Odor sensory evaluation exerts significant influence in diverse application domains, from food science to textiles and cosmetics. Traditional subjective assessments suffer from poor repeatability and inherent inter-individual variability; conversely, machine olfaction technologies such as electronic nose (E-nose) systems are objective but lack capacity to capture human emotional preferences. Olfactory EEG incorporates neural correlates of olfactory perception and emotional valence, promising improved recognition of olfactory preference. However, cross-subject generalization remains challenging due to pronounced EEG inter-individual variability.
The paper introduces a collaborative human-machine multimodal recognition framework, leveraging both E-nose and olfactory EEG signals to realize robust cross-subject olfactory preference classification. The approach exploits the complementary strengths of each modality: E-nose signals provide consistent odor information, while EEG signals encode subjective preference and emotion. This fusion addresses limitations in traditional monomodal systems and advances odor evaluation practices.
Figure 1: Motivation for multimodal fusion of olfactory EEG and E-nose signals to reconcile subjective and objective information in scent preference recognition tasks.
System Architecture and Methodology
The proposed methodology comprises several stages: data acquisition, preprocessing, feature extraction, alignment, multimodal interaction, individual feature mining, and final decision fusion.
Data Acquisition and Instrumentation
A custom experimental setup captures synchronized EEG and E-nose signals during odor presentations. The E-nose system deploys a 10-channel metal oxide sensor array for comprehensive odor profiling, interfaced with parallel EEG acquisition using a 21-channel cap (10-20 system) at 256 Hz. Four complex odors are used as stimuli (aroma, stink, fishy, fruity), providing broad stimulus variance for model generalization across 24 healthy subjects.
Figure 2: Hardware architecture of the E-nose sampling system used for synchronized odor signal acquisition.
Multimodal Feature Extraction and Alignment
Initial feature extraction utilizes AlexNet for E-nose data and deeper ResNet architecture for EEG data, reflecting modality-driven differences in underlying signal complexity. To facilitate subsequent fusion, initial features are dimensionally aligned using mean squared error minimization.
Multimodal Feature Interaction and Fusion
The core innovation is the BMFNet architecture, segmented into:
- Feature Mining and Alignment (FMA)
- Multimodal Feature Interaction (MFI): Crossmodal attention and self-attention are employed to jointly mine common features representative of objective odor information.
- Aligned EEG Feature Mining (AEFM): Sequential self-attention layers emphasize individual EEG-derived variations linked to personal preference.
- Feature Fusion (FF): Classification tokens from both common and individual features are concatenated and processed for final prediction.
Knowledge Distillation for Model Efficiency
Knowledge distillation refines the student (BMFNet-S) model by transferring multimodal knowledge from a larger teacher (BMFNet-T) network. Transformer block distillation proceeds via attention and hidden state matching losses, facilitating compression with minimal performance degradation. The distilled student model achieves near-equivalent accuracy (92.79%) with 40% reduction in computational cost.
Experimental Results and Comparative Analysis
Extensive experiments demonstrate superior performance of the proposed approach over prevailing monomodal and multimodal baselines. State-of-the-art monomodal models such as RestNet18 (EEG) and AlexNet (E-nose) exhibit notable modality-dependent limitations: below 70% accuracy for EEG due to variability and ~90% for E-nose reflecting lack of preference encoding. Multimodal competitors (e.g., MulT, ViLT, mBERT) achieve up to 90.85% but fail to consistently extract individual preference features.
BMFNet-S achieves 92.79% accuracy and 92.64% F1-score, significantly outperforming all monomodal and multimodal baselines. Ablation studies corroborate the necessity of both MFI (for common feature mining) and AEFM (for individualized preference extraction). t-SNE visualizations reveal highly separable feature spaces for BMFNet-S.
Visualization of Feature Spaces
t-SNE mappings provide clear insights into the discriminative capacity of various modules:
(Figure 3)
Figure 3: t-SNE projections of FC layer features in monomodal models—RestNet18 for EEG and AlexNet for E-nose—demonstrating superior separability in E-nose signals.
(Figure 4)
Figure 4: t-SNE visualizations of feature mappings obtained via ViLT, MulT, and BMFNet-S on cross-subject recognition tasks.
(Figure 5)
Figure 5: Feature mapping for BMFNet-S modules (AEFM, MFI, FF) revealing progressive improvement in class separability and cross-subject generalization.
(Figure 6)
Figure 6: Classification performance across BNet-S, MNet-S, and BMFNet-S, demonstrating the proposed model’s resilience to individual differences.
(Figure 7)
Figure 7: Ablation study contrasting BMFNet-S, BMFNet-S w/o AEFM, and BMFNet-S w/o contrastive loss, evidencing architectural contributions to improved recognition.
Implications and Future Directions
The collaborative multimodal paradigm substantially enhances cross-subject generalization in olfactory preference recognition, a key challenge for practical deployment in industrial sensory evaluation. The methodological framework is extensible to other EEG-based cross-subject recognition scenarios (e.g., sleep staging, motor imagery), advocating for strategic fusion with auxiliary modalities capturing objective task features. However, expanding dataset scale—both subject pool and odor diversity—and increasing E-nose sensor granularity are critical for improved ecological validity and bionic/human olfaction correspondence.
Conclusion
This work establishes a robust human-machine cooperative multimodal learning method integrating E-nose and olfactory EEG signals for reliable cross-subject olfactory preference recognition. Experimental results validate its superiority over current monomodal and multimodal methods, attributed to effective common and individual feature mining, advanced attention-based fusion, and knowledge distillation for computational efficiency. The approach informs broader development of generalizable odor evaluation tools and cross-subject biosignal analysis systems, advancing practical applications in industrial and clinical domains.