MISFIT-V: Misaligned Image Synthesis and Fusion using Information from Thermal and Visual
Abstract: Detecting humans from airborne visual and thermal imagery is a fundamental challenge for Wilderness Search-and-Rescue (WiSAR) teams, who must perform this function accurately in the face of immense pressure. The ability to fuse these two sensor modalities can potentially reduce the cognitive load on human operators and/or improve the effectiveness of computer vision object detection models. However, the fusion task is particularly challenging in the context of WiSAR due to hardware limitations and extreme environmental factors. This work presents Misaligned Image Synthesis and Fusion using Information from Thermal and Visual (MISFIT-V), a novel two-pronged unsupervised deep learning approach that utilizes a Generative Adversarial Network (GAN) and a cross-attention mechanism to capture the most relevant features from each modality. Experimental results show MISFIT-V offers enhanced robustness against misalignment and poor lighting/thermal environmental conditions compared to existing visual-thermal image fusion methods.
- A Review on Multimodal Medical Image Fusion: Compendious Analysis of Medical Modalities, Multimodal Databases, Fusion Techniques and Quality Metrics. Computers in Biology and Medicine, 2022.
- Convolutional Autoencoder-Based Multispectral Image Fusion. IEEE Access, 2019.
- Two Headed Dragons: Multimodal Fusion And Cross Modal Transactions. In IEEE International Conference on Image Processing, 2021.
- WiSARD: A Labeled Visual and Thermal Image Dataset for Wilderness Search and Rescue. In IEEE/RSJ Int. Conf. on Intelligent Robots & Systems, 2022.
- A Multimodal Transformer to Fuse Images and Metadata for Skin Disease Classification. The Visual Computer, 2022.
- Crossvit: Cross-Attention Multi-Scale Vision Transformer for Image Classification. In IEEE Int. Conf. on Computer Vision, 2021.
- Deep Learning Based Multi-Modal Fusion Architectures for Maritime Vessel Detection. Remote Sensing, 2020.
- MFNet: Towards real-time semantic segmentation for autonomous vehicles with multi-spectral scenes. In IEEE/RSJ Int. Conf. on Intelligent Robots & Systems, 2017.
- Image-to-Image Translation with Conditional Adversarial Networks. In IEEE Conf. on Computer Vision and Pattern Recognition, 2017.
- AttentionFGAN: Infrared and Visible Image Fusion Using Attention-Based Generative Adversarial Networks. IEEE Transactions on Multimedia, 2021.
- Deep Learning for Pixel-Level Image Fusion: Recent Advances and Future Prospects. Information Fusion, 2018.
- Infrared and Visible Image Fusion Technology and Application: A Review. Sensors, 2023.
- TGFuse: An Infrared and Visible Image Fusion Approach Based on Transformer and Generative Adversarial Network. IEEE Transactions on Image Processing, 2023.
- Feature Point Descriptors: Infrared and Visible Spectra. Sensors, 2014.
- Image Fusion in the Loop of High-Level Vision Tasks: A Semantic-Aware Real-Time Infrared and Visible Image Fusion Network. Information Fusion, 2022.
- Fusion of Visible and Thermal Imagery Improves Situational Awareness. Displays, 1997.
- Attention is all you need. In Conf. on Neural Information Processing Systems, 2017.
- Unsupervised Misaligned Infrared and Visible Image Fusion via Cross-Modality Image Generation and Registration. Available at https://arxiv.org/abs/2205.11876, 2022.
- MAGE: Multisource Attention Network With Discriminative Graph and Informative Entities for Classification of Hyperspectral and LiDAR Data. IEEE Transactions on Geosciences and Remote Sensing, 2022.
- MRI Cross-Modality Image-to-Image Translation. Scientific reports, 2020.
- Remote Sensing Image Fusion Based on Adaptive IHS and Multiscale Guided Filter. IEEE Access, 2016.
- Unpaired image-to-image translation using cycle-consistent adversarial networks. In IEEE Int. Conf. on Computer Vision, 2017.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.