Learning to Find Missing Video Frames with Synthetic Data Augmentation: A General Framework and Application in Generating Thermal Images Using RGB Cameras
Abstract: Advanced Driver Assistance Systems (ADAS) in intelligent vehicles rely on accurate driver perception within the vehicle cabin, often leveraging a combination of sensing modalities. However, these modalities operate at varying rates, posing challenges for real-time, comprehensive driver state monitoring. This paper addresses the issue of missing data due to sensor frame rate mismatches, introducing a generative model approach to create synthetic yet realistic thermal imagery. We propose using conditional generative adversarial networks (cGANs), specifically comparing the pix2pix and CycleGAN architectures. Experimental results demonstrate that pix2pix outperforms CycleGAN, and utilizing multi-view input styles, especially stacked views, enhances the accuracy of thermal image generation. Moreover, the study evaluates the model's generalizability across different subjects, revealing the importance of individualized training for optimal performance. The findings suggest the potential of generative models in addressing missing frames, advancing driver state monitoring for intelligent vehicles, and underscoring the need for continued research in model generalization and customization.
- S. Vora, A. Rangesh, and M. M. Trivedi, “Driver gaze zone estimation using convolutional neural networks: A general framework and ablative analysis,” IEEE Transactions on Intelligent Vehicles, vol. 3, no. 3, pp. 254–265, 2018.
- E. Ohn-Bar, A. Tawari, S. Martin, and M. M. Trivedi, “On surveillance for safety critical events: In-vehicle video networks for predictive driver assistance systems,” Computer Vision and Image Understanding, vol. 134, pp. 130–140, 2015.
- R. Greer, L. Rakla, A. Gopalkrishnan, and M. Trivedi, “Multi-view ensemble learning with missing data: Computational framework and evaluations using novel data from the safe autonomous driving domain,” arXiv preprint arXiv:2301.12592, 2023.
- C. Kantas, B. Antoniussen, M. V. Andersen, R. Munksø, S. Kotnala, S. B. Jensen, A. Møgelmose, L. Nørgaard, and T. B. Moeslund, “Raw instinct: Trust your classifiers and skip the conversion,” in 2023 IEEE 6th International Conference on Pattern Recognition and Artificial Intelligence (PRAI), pp. 456–460, IEEE, 2023.
- S. Kajiwara, “Driver-condition detection using a thermal imaging camera and neural networks,” International journal of automotive technology, vol. 22, pp. 1505–1515, 2021.
- S. Bole, C. Fournier, C. Lavergne, G. Druart, and T. Lépine, “Driver head pose tracking with thermal camera,” in Infrared Sensors, Devices, and Applications VI, vol. 9974, pp. 158–167, SPIE, 2016.
- V. Mattioli, L. Davoli, L. Belli, G. Ferrari, and R. Raheli, “Thermal camera-based driver monitoring in the automotive scenario,” in 2023 AEIT International Conference on Electrical and Electronic Technologies for Automotive (AEIT AUTOMOTIVE), pp. 1–6, IEEE, 2023.
- S. E. H. Kiashari, A. Nahvi, A. Homayounfard, and H. Bakhoda, “Monitoring the variation in driver respiration rate from wakefulness to drowsiness: a non-intrusive method for drowsiness detection using thermal imaging,” Journal of Sleep Sciences, vol. 3, no. 1-2, pp. 1–9, 2018.
- C. Weiss, A. Kirmas, S. Lemcke, S. Böshagen, M. Walter, L. Eckstein, and S. Leonhardt, “Head tracking in automotive environments for driver monitoring using a low resolution thermal camera,” Vehicles, vol. 4, no. 1, pp. 219–233, 2022.
- C. Palmero, A. Clapés, C. Bahnsen, A. Møgelmose, T. B. Moeslund, and S. Escalera, “Multi-modal rgb–depth–thermal human body segmentation,” International Journal of Computer Vision, vol. 118, pp. 217–239, 2016.
- A. Rangesh, N. Deo, R. Greer, P. Gunaratne, and M. M. Trivedi, “Autonomous vehicles that alert humans to take-over controls: Modeling with real-world data,” in 2021 IEEE International Intelligent Transportation Systems Conference (ITSC), pp. 231–236, IEEE, 2021.
- A. Rangesh, N. Deo, R. Greer, P. Gunaratne, and M. M. Trivedi, “Predicting take-over time for autonomous driving with real-world data: Robust data augmentation, models, and evaluation,” arXiv preprint arXiv:2107.12932, 2021.
- R. Greer, N. Deo, A. Rangesh, P. Gunaratne, and M. Trivedi, “Safe control transitions: Machine vision based observable readiness index and data-driven takeover time prediction,” arXiv preprint arXiv:2301.05805, 2023.
- R. Greer, L. Rakla, A. Gopalan, and M. Trivedi, “(safe) smart hands: Hand activity analysis and distraction alerts using a multi-camera framework,” arXiv preprint arXiv:2301.05838, 2023.
- Y. Iwashita, K. Nakashima, S. Rafol, A. Stoica, and R. Kurazume, “Mu-net: Deep learning-based thermal ir image estimation from rgb image,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, pp. 0–0, 2019.
- X. Chen, C. Lian, L. Wang, H. Deng, T. Kuang, S. H. Fung, J. Gateno, D. Shen, J. J. Xia, and P.-T. Yap, “Diverse data augmentation for learning image segmentation with cross-modality annotations,” Medical image analysis, vol. 71, p. 102060, 2021.
- W. Wang, X. Yu, B. Fang, D.-Y. Zhao, Y. Chen, W. Wei, and J. Chen, “Cross-modality lge-cmr segmentation using image-to-image translation based data augmentation,” IEEE/ACM Transactions on Computational Biology and Bioinformatics, 2022.
- G. Hermosilla, D.-I. H. Tapia, H. Allende-Cid, G. F. Castro, and E. Vera, “Thermal face generation using stylegan,” IEEE Access, vol. 9, pp. 80511–80523, 2021.
- M. Abdrakhmanova, A. Kuzdeuov, S. Jarju, Y. Khassanov, M. Lewis, and H. A. Varol, “Speakingfaces: A large-scale multimodal dataset of voice commands with visual and thermal video streams,” Sensors, vol. 21, no. 10, p. 3465, 2021.
- Y. Li, Y. Ko, and W. Lee, “A feasibility study on translation of rgb images to thermal images: Development of a machine learning algorithm,” SN Computer Science, vol. 4, no. 5, p. 555, 2023.
- M. Chaubey, L. K. Singh, and M. Gupta, “Estimation of missing video frames using kalman filter,” Multimedia Tools and Applications, pp. 1–21, 2023.
- Y. Li, D. Roblek, and M. Tagliasacchi, “From here to there: Video inbetweening using direct 3d convolutions,” arXiv preprint arXiv:1905.10240, 2019.
- M. Mirza and S. Osindero, “Conditional generative adversarial nets,” arXiv preprint arXiv:1411.1784, 2014.
- P. Isola, J.-Y. Zhu, T. Zhou, and A. A. Efros, “Image-to-image translation with conditional adversarial networks,” 2018.
- J.-Y. Zhu, T. Park, P. Isola, and A. A. Efros, “Unpaired image-to-image translation using cycle-consistent adversarial networks,” in Proceedings of the IEEE international conference on computer vision, pp. 2223–2232, 2017.
- A. Rangesh, B. Zhang, and M. M. Trivedi, “Gaze preserving cyclegans for eyeglass removal and persistent gaze estimation,” IEEE Transactions on Intelligent Vehicles, vol. 7, no. 2, pp. 377–386, 2022.
- R. Greer, A. Gopalkrishnan, M. Keskar, and M. M. Trivedi, “Patterns of vehicle lights: Addressing complexities of camera-based vehicle light datasets and metrics,” Pattern Recognition Letters, 2024.
- A. Doshi and M. M. Trivedi, “Examining the impact of driving style on the predictability and responsiveness of the driver: Real-world and simulator analysis,” in 2010 IEEE Intelligent Vehicles Symposium, pp. 232–237, IEEE, 2010.
- A. Tawari and M. M. Trivedi, “Robust and continuous estimation of driver gaze zone by dynamic analysis of multiple face videos,” in 2014 IEEE Intelligent Vehicles Symposium Proceedings, pp. 344–349, IEEE, 2014.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.