Papers
Topics
Authors
Recent
Search
2000 character limit reached

MISFIT-V: Misaligned Image Synthesis and Fusion using Information from Thermal and Visual

Published 22 Sep 2023 in cs.CV, cs.AI, cs.HC, and cs.RO | (2309.13216v1)

Abstract: Detecting humans from airborne visual and thermal imagery is a fundamental challenge for Wilderness Search-and-Rescue (WiSAR) teams, who must perform this function accurately in the face of immense pressure. The ability to fuse these two sensor modalities can potentially reduce the cognitive load on human operators and/or improve the effectiveness of computer vision object detection models. However, the fusion task is particularly challenging in the context of WiSAR due to hardware limitations and extreme environmental factors. This work presents Misaligned Image Synthesis and Fusion using Information from Thermal and Visual (MISFIT-V), a novel two-pronged unsupervised deep learning approach that utilizes a Generative Adversarial Network (GAN) and a cross-attention mechanism to capture the most relevant features from each modality. Experimental results show MISFIT-V offers enhanced robustness against misalignment and poor lighting/thermal environmental conditions compared to existing visual-thermal image fusion methods.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (22)
  1. A Review on Multimodal Medical Image Fusion: Compendious Analysis of Medical Modalities, Multimodal Databases, Fusion Techniques and Quality Metrics. Computers in Biology and Medicine, 2022.
  2. Convolutional Autoencoder-Based Multispectral Image Fusion. IEEE Access, 2019.
  3. Two Headed Dragons: Multimodal Fusion And Cross Modal Transactions. In IEEE International Conference on Image Processing, 2021.
  4. WiSARD: A Labeled Visual and Thermal Image Dataset for Wilderness Search and Rescue. In IEEE/RSJ Int. Conf. on Intelligent Robots & Systems, 2022.
  5. A Multimodal Transformer to Fuse Images and Metadata for Skin Disease Classification. The Visual Computer, 2022.
  6. Crossvit: Cross-Attention Multi-Scale Vision Transformer for Image Classification. In IEEE Int. Conf. on Computer Vision, 2021.
  7. Deep Learning Based Multi-Modal Fusion Architectures for Maritime Vessel Detection. Remote Sensing, 2020.
  8. MFNet: Towards real-time semantic segmentation for autonomous vehicles with multi-spectral scenes. In IEEE/RSJ Int. Conf. on Intelligent Robots & Systems, 2017.
  9. Image-to-Image Translation with Conditional Adversarial Networks. In IEEE Conf. on Computer Vision and Pattern Recognition, 2017.
  10. AttentionFGAN: Infrared and Visible Image Fusion Using Attention-Based Generative Adversarial Networks. IEEE Transactions on Multimedia, 2021.
  11. Deep Learning for Pixel-Level Image Fusion: Recent Advances and Future Prospects. Information Fusion, 2018.
  12. Infrared and Visible Image Fusion Technology and Application: A Review. Sensors, 2023.
  13. TGFuse: An Infrared and Visible Image Fusion Approach Based on Transformer and Generative Adversarial Network. IEEE Transactions on Image Processing, 2023.
  14. Feature Point Descriptors: Infrared and Visible Spectra. Sensors, 2014.
  15. Image Fusion in the Loop of High-Level Vision Tasks: A Semantic-Aware Real-Time Infrared and Visible Image Fusion Network. Information Fusion, 2022.
  16. Fusion of Visible and Thermal Imagery Improves Situational Awareness. Displays, 1997.
  17. Attention is all you need. In Conf. on Neural Information Processing Systems, 2017.
  18. Unsupervised Misaligned Infrared and Visible Image Fusion via Cross-Modality Image Generation and Registration. Available at https://arxiv.org/abs/2205.11876, 2022.
  19. MAGE: Multisource Attention Network With Discriminative Graph and Informative Entities for Classification of Hyperspectral and LiDAR Data. IEEE Transactions on Geosciences and Remote Sensing, 2022.
  20. MRI Cross-Modality Image-to-Image Translation. Scientific reports, 2020.
  21. Remote Sensing Image Fusion Based on Adaptive IHS and Multiscale Guided Filter. IEEE Access, 2016.
  22. Unpaired image-to-image translation using cycle-consistent adversarial networks. In IEEE Int. Conf. on Computer Vision, 2017.

Summary

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.