Papers
Topics
Authors
Recent
Search
2000 character limit reached

Virtual Fusion with Contrastive Learning for Single Sensor-based Activity Recognition

Published 1 Dec 2023 in cs.LG, cs.AI, and cs.CV | (2312.02185v1)

Abstract: Various types of sensors can be used for Human Activity Recognition (HAR), and each of them has different strengths and weaknesses. Sometimes a single sensor cannot fully observe the user's motions from its perspective, which causes wrong predictions. While sensor fusion provides more information for HAR, it comes with many inherent drawbacks like user privacy and acceptance, costly set-up, operation, and maintenance. To deal with this problem, we propose Virtual Fusion - a new method that takes advantage of unlabeled data from multiple time-synchronized sensors during training, but only needs one sensor for inference. Contrastive learning is adopted to exploit the correlation among sensors. Virtual Fusion gives significantly better accuracy than training with the same single sensor, and in some cases, it even surpasses actual fusion using multiple sensors at test time. We also extend this method to a more general version called Actual Fusion within Virtual Fusion (AFVF), which uses a subset of training sensors during inference. Our method achieves state-of-the-art accuracy and F1-score on UCI-HAR and PAMAP2 benchmark datasets. Implementation is available upon request.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (43)
  1. B. Kwolek and M. Kepski, “Fuzzy inference-based fall detection using kinect and body-worn accelerometer,” Applied Soft Computing, vol. 40, pp. 305–318, 3 2016.
  2. H. Zhang, M. Alrifaai, K. Zhou, and H. Hu, “A novel fuzzy logic algorithm for accurate fall detection of smart wristband,” Transactions of the Institute of Measurement and Control, vol. 42, pp. 786–794, 2 2020.
  3. A. Hakim, M. S. Huq, S. Shanta, and B. Ibrahim, “Smartphone based data mining for fall detection: Analysis and design,” Procedia Computer Science, vol. 105, pp. 46–51, 2017.
  4. S. Denkovski, S. S. Khan, B. Malamis, S. Y. Moon, B. Ye, and A. Mihailidis, “Multi visual modality fall detection dataset,” IEEE Access, vol. 10, pp. 106 422–106 435, 2022.
  5. Z. Wang, V. Ramamoorthy, U. Gal, and A. Guez, “Possible life saver: A review on human fall detection technology,” Robotics, vol. 9, p. 55, 7 2020.
  6. “Multi-sensor fusion for activity recognition—a survey,” Sensors, vol. 19, p. 3808, 9 2019.
  7. S. K. Yadav, K. Tiwari, H. M. Pandey, and S. A. Akbar, “A review of multimodal human activity recognition with special emphasis on classification, applications, challenges and future directions,” Knowledge-Based Systems, vol. 223, p. 106970, 7 2021.
  8. M. Webber and R. F. Rojas, “Human activity recognition with accelerometer and gyroscope: A data fusion approach,” IEEE Sensors Journal, vol. 21, pp. 16 979–16 989, 8 2021.
  9. M. M. Islam, S. Nooruddin, F. Karray, and G. Muhammad, “Multi-level feature fusion for multimodal human activity recognition in internet of healthcare things,” Information Fusion, vol. 94, pp. 17–31, 6 2023.
  10. S. Zhu, R. G. Guendel, A. Yarovoy, and F. Fioranelli, “Continuous human activity recognition with distributed radar sensor networks and cnn–rnn architectures,” IEEE Transactions on Geoscience and Remote Sensing, vol. 60, pp. 1–15, 2022.
  11. L. Cao, S. Liang, Z. Zhao, D. Wang, C. Fu, and K. Du, “Human activity recognition method based on fmcw radar sensor with multi-domain feature attention fusion network,” Sensors, vol. 23, p. 5100, 5 2023.
  12. C. Pham, L. Nguyen, A. Nguyen, N. Nguyen, and V.-T. Nguyen, “Combining skeleton and accelerometer data for human fine-grained activity recognition and abnormal behaviour detection with deep temporal convolutional networks,” Multimedia Tools and Applications, vol. 80, pp. 28 919–28 940, 8 2021.
  13. X. Feng, Y. Weng, W. Li, P. Chen, and H. Zheng, “Damun: A domain adaptive human activity recognition network based on multimodal feature fusion,” IEEE Sensors Journal, vol. 23, pp. 22 019–22 030, 2023.
  14. H.-N. Vu, T. Hoang, C. Tran, and C. Pham, “Sign language recognition with self-learning fusion model,” IEEE Sensors Journal, p. 1, 2023.
  15. R. A. Hamad, L. Yang, W. L. Woo, and B. Wei, “Convnet-based performers attention and supervised contrastive learning for activity recognition,” Applied Intelligence, vol. 53, pp. 8809–8825, 4 2023.
  16. X. Ouyang, X. Shuai, J. Zhou, I. W. Shi, Z. Xie, G. Xing, and J. Huang, “Cosmo: Contrastive fusion learning with small data for multimodal human activity recognition,” 2022.
  17. Y. Jain, C. I. Tang, C. Min, F. Kawsar, and A. Mathur, “Collossl: Collaborative self-supervised learning for human activity recognition,” Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies, vol. 6, 2022.
  18. S. Deldari, H. Xue, A. Saeed, D. V. Smith, and F. D. Salim, “Cocoa: Cross modality contrastive learning for sensor data,” Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies, vol. 6, pp. 1–28, 9 2022.
  19. I. Koo, Y. Park, M. Jeong, and C. Kim, “Contrastive accelerometer-gyroscope embedding model for human activity recognition,” IEEE Sensors Journal, vol. 23, 2023.
  20. R. Brinzea, B. Khaertdinov, and S. Asteriadis, “Contrastive learning with cross-modal knowledge mining for multimodal human activity recognition,” vol. 2022-July, 2022.
  21. Y. Tian, D. Krishnan, and P. Isola, “Contrastive multiview coding,” 6 2019.
  22. T. Chen, S. Kornblith, M. Norouzi, and G. Hinton, “A simple framework for contrastive learning of visual representations,” 2 2020.
  23. T. Huynh, S. Kornblith, M. R. Walter, M. Maire, and M. Khademi, “Boosting contrastive self-supervised learning with false negative cancellation,” 2022.
  24. J. Wang, T. Zhu, L. L. Chen, H. Ning, and Y. Wan, “Negative selection by clustering for contrastive learning in human activity recognition,” IEEE Internet of Things Journal, vol. 10, pp. 10 833–10 844, 6 2023.
  25. J. Robinson, C. Y. Chuang, S. Sra, and S. Jegelka, “Contrastive learning with hard negative samples,” 2021.
  26. S. Hong. [Online]. Available: https://github.com/hsd1503/resnet1d
  27. T. H. Tran, T. L. Le, D. T. Pham, V. N. Hoang, V. M. Khong, Q. T. Tran, T. S. Nguyen, and C. Pham, “A multi-modal multi-view dataset for human fall analysis and preliminary investigation on modality,” vol. 2018-August, 2018.
  28. L. Martínez-Villaseñor, H. Ponce, J. Brieva, E. Moya-Albor, J. Núñez-Martínez, and C. Peñafort-Asturiano, “Up-fall detection dataset: A multimodal approach,” Sensors, vol. 19, p. 1988, 4 2019.
  29. Z. Cao, G. Hidalgo, T. Simon, S.-E. Wei, and Y. Sheikh, “Openpose: Realtime multi-person 2d pose estimation using part affinity fields,” 12 2018.
  30. M. Saleh, M. Abbas, and R. B. L. Jeannes, “Fallalld: An open dataset of human falls and activities of daily living for classical and deep learning applications,” IEEE Sensors Journal, vol. 21, 2021.
  31. D. Anguita, A. Ghio, L. Oneto, X. Parra, and J. L. Reyes-Ortiz, “A public domain dataset for human activity recognition using smartphones,” 2013.
  32. A. Reiss and D. Stricker, “Introducing a new benchmarked dataset for activity monitoring.”   IEEE, 6 2012, pp. 108–109.
  33. T. T. Um, F. M. Pfister, D. Pichler, S. Endo, M. Lang, S. Hirche, U. Fietzek, and D. Kulic, “Data augmentation of wearable sensor data for parkinson’s disease monitoring using convolutional neural networks,” vol. 2017-January, 2017.
  34. A. Ignatov, “Real-time human activity recognition from accelerometer data using convolutional neural networks,” Applied Soft Computing, vol. 62, pp. 915–922, 1 2018.
  35. Q. Teng, K. Wang, L. Zhang, and J. He, “The layer-wise training convolutional neural networks using local loss for sensor-based human activity recognition,” IEEE Sensors Journal, vol. 20, pp. 7265–7274, 7 2020.
  36. W. Gao, L. Zhang, Q. Teng, J. He, and H. Wu, “Danhar: Dual attention network for multimodal human activity recognition using wearable sensors,” Applied Soft Computing, vol. 111, p. 107728, 11 2021.
  37. D. Bhattacharya, D. Sharma, W. Kim, M. F. Ijaz, and P. K. Singh, “Ensem-har: An ensemble deep learning model for smartphone sensor-based human activity recognition for measurement of elderly health monitoring,” Biosensors, vol. 12, p. 393, 6 2022.
  38. A. Dahou, M. A. Al-qaness, M. A. Elaziz, and A. Helmi, “Human activity recognition in ioht applications using arithmetic optimization algorithm and deep learning,” Measurement, vol. 199, p. 111445, 8 2022.
  39. A. M. Helmi, M. A. Al-qaness, A. Dahou, and M. A. Elaziz, “Human activity recognition using marine predators algorithm with deep learning,” Future Generation Computer Systems, vol. 142, pp. 340–350, 5 2023.
  40. M. A. A. Al-qaness, A. Dahou, M. A. Elaziz, and A. M. Helmi, “Multi-resatt: Multilevel residual network with attention for human activity recognition using wearable sensors,” IEEE Transactions on Industrial Informatics, vol. 19, pp. 144–152, 1 2023.
  41. A. Sezavar, R. Atta, and M. Ghanbari, “Dcapsnet: Deep capsule network for human activity and gait recognition with smartphone sensors,” Pattern Recognition, vol. 147, p. 110054, 3 2024.
  42. Q. Xu, M. Wu, X. Li, K. Mao, and Z. Chen, “Contrastive distillation with regularized knowledge for deep model compression on sensor-based human activity recognition,” IEEE Transactions on Industrial Cyber-Physical Systems, vol. 1, pp. 217–226, 2023.
  43. D. Cheng, L. Zhang, C. Bu, H. Wu, and A. Song, “Learning hierarchical time series data augmentation invariances via contrastive supervision for human activity recognition,” Knowledge-Based Systems, vol. 276, p. 110789, 9 2023.
Citations (6)

Summary

  • The paper introduces Virtual Fusion, which uses contrastive learning to exploit correlations between multiple sensors during training while relying on a single sensor during inference.
  • It extends the approach with Actual Fusion within Virtual Fusion (AFVF), enabling flexible sensor deployment by fusing data via both early and late fusion, with late fusion showing superior performance.
  • Experimental results on UCI-HAR and PAMAP2 benchmarks demonstrate improved accuracy and F1-score compared to traditional single-sensor methods, validating the framework's effectiveness.

Virtual Fusion for Activity Recognition

The paper "Virtual Fusion with Contrastive Learning for Single Sensor-based Activity Recognition" (2312.02185) introduces a novel approach to Human Activity Recognition (HAR) that leverages contrastive learning to exploit correlations between multiple sensors during training, while relying on only a single sensor during inference. This method addresses the limitations of traditional sensor fusion, which often entails significant costs and complexities related to setup, operation, and maintenance. The authors also present an extension of this approach named Actual Fusion within Virtual Fusion (AFVF), which allows for inference using a subset of the sensors used during training.

Problem Formulation and Approach

The core idea behind Virtual Fusion is to train models using both labeled and unlabeled data from multiple time-synchronized sensors. The labeled dataset, denoted as DlblD_{lbl}, consists of data-label pairs (xim,yi)(x_i^m, y_i), where ximx_i^m represents the data from sensor mm and yiy_i is the corresponding activity label. The unlabeled dataset, DulbD_{ulb}, contains data from multiple sensors without labels, represented as ximx_i^m. The method aims to train a classification model for each modality mMclsm \in M_{cls}, where MclsM_{cls} is the set of modalities used for classification. The classification model consists of a feature extractor fmf^m that maps the input xmx^m to a latent feature vector zmz^m, and a classifier cmc^m that maps zmz^m to the predicted activity label yy. Figure 1

Figure 1: Overall training process of Virtual Fusion. Dotted lines are optional, depending on label availability.

The Virtual Fusion framework (Figure 1) employs a contrastive learning approach using a multi-view NT-Xent loss function to exploit the correlation between different sensor modalities. The NT-Xent loss, derived from the SimCLR framework, is used to maximize the similarity between feature vectors from different sensors that correspond to the same activity. For two modalities m1m_1 and m2m_2, the NT-Xent loss for a sample at index ii is defined as:

(zim1,zim2)=logexp(sim(zim1,zim2)/τ)j=1Bexp(sim(zim1,zjm2)/τ),\ell(z^{m_1}_i, z^{m_2}_i) = -\log \frac{\exp(\text{sim}(z^{m_1}_i, z^{m_2}_i) / \tau)}{\sum_{j=1}^B \exp(\text{sim}(z^{m1}_i, z^{m2}_j) / \tau)},

where sim\text{sim} is the cosine similarity function, τ\tau is a temperature hyper-parameter, and BB is the mini-batch size.

Actual Fusion within Virtual Fusion (AFVF)

The authors extend Virtual Fusion to AFVF, which enables inference using a subset of the sensors used during training. This is particularly useful in scenarios where certain sensors may not be available or practical during deployment. AFVF supports both early and late fusion techniques to combine data from multiple sensors. In early fusion, data from different sensors are fused at the data level, while in late fusion, features extracted from individual sensors are fused at the feature level. Figure 2

Figure 2: Examples of AFVF that fuses 2 out of multiple modalities. The dotted line connections are only applicable if mMlblm \in M_{lbl}.

The authors found that late fusion generally yields better results than early fusion in AFVF due to its ability to capture more nuanced information from each sensor using dedicated feature extractors (Figure 2). They argue that the fused modality should be included in the contrastive loss computation to directly support the classification task. The fused feature vector zfusedz^{fused} is computed as:

zfused=project(concatenate(z1,...,zn)),z^{fused} = \text{project}(\text{concatenate}(z^1, ..., z^n)),

where z1,...,znz^1, ..., z^n are the feature vectors from the individual sensors, and "project" refers to a fully connected layer used as a projector. Figure 3

Figure 3: Example of AFVF that fuses all modalities. Early fusion is not applicable.

In scenarios where all training sensors are available during testing, late AFVF is advantageous, as it allows for the fusion of all modalities (Figure 3). Early AFVF is not applicable in this case because it does not produce features for the source modalities.

Experimental Results

The authors conducted experiments on several benchmark datasets, including UCI-HAR and PAMAP2, to evaluate the performance of Virtual Fusion and AFVF. The results demonstrate that Virtual Fusion consistently outperforms single-sensor training, and in some cases, it even surpasses actual sensor fusion. Specifically, AFVF achieved state-of-the-art accuracy and F1-score on both benchmark datasets.

The authors also performed ablation studies to validate the design choices of Virtual Fusion, such as the use of a multi-view NT-Xent loss and the inclusion of the fused modality in the contrastive loss computation. The results of these studies support the effectiveness of the proposed approach.

Conclusion

The paper presents a compelling approach to HAR that addresses the limitations of traditional sensor fusion by leveraging contrastive learning and virtual fusion techniques. The proposed method offers increased flexibility in sensor selection and deployment, while achieving state-of-the-art performance on benchmark datasets.

The use of unlabeled multimodal data for representation learning is a promising avenue for future research, particularly given the relative ease and lower cost of collecting unlabeled data compared to labeled data. Future work could explore the use of domain adaptation or generalization techniques to further improve the performance of Virtual Fusion, as well as investigate the effects of the number of sensors and sensor characteristics on the method's accuracy.

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 1 tweet with 0 likes about this paper.