Tensor Factorization for Leveraging Cross-Modal Knowledge in Data-Constrained Infrared Object Detection
Abstract: The primary bottleneck towards obtaining good recognition performance in IR images is the lack of sufficient labeled training data, owing to the cost of acquiring such data. Realizing that object detection methods for the RGB modality are quite robust (at least for some commonplace classes, like person, car, etc.), thanks to the giant training sets that exist, in this work we seek to leverage cues from the RGB modality to scale object detectors to the IR modality, while preserving model performance in the RGB modality. At the core of our method, is a novel tensor decomposition method called TensorFact which splits the convolution kernels of a layer of a Convolutional Neural Network (CNN) into low-rank factor matrices, with fewer parameters than the original CNN. We first pretrain these factor matrices on the RGB modality, for which plenty of training data are assumed to exist and then augment only a few trainable parameters for training on the IR modality to avoid over-fitting, while encouraging them to capture complementary cues from those trained only on the RGB modality. We validate our approach empirically by first assessing how well our TensorFact decomposed network performs at the task of detecting objects in RGB images vis-a-vis the original network and then look at how well it adapts to IR images of the FLIR ADAS v1 dataset. For the latter, we train models under scenarios that pose challenges stemming from data paucity. From the experiments, we observe that: (i) TensorFact shows performance gains on RGB images; (ii) further, this pre-trained model, when fine-tuned, outperforms a standard state-of-the-art object detector on the FLIR ADAS v1 dataset by about 4% in terms of mAP 50 score.
- Cross-modal knowledge transfer without task-relevant source data. In European Conference on Computer Vision, pages 111–127. Springer, 2022.
- K-means++ the advantages of careful seeding. In Proceedings of the eighteenth annual ACM-SIAM symposium on Discrete algorithms, pages 1027–1035, 2007.
- YOLOv4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934, 2020.
- Biological infrared imaging and sensing. Micron, 33(2):211–225, 2002.
- Every feature counts: An improved one-stage detector in thermal imagery. In 2019 IEEE 5th International Conference on Computer and Communications (ICCC), pages 1965–1969. IEEE, 2019.
- TIRNet: Object detection in thermal infrared images for autonomous driving. Applied Intelligence, 51:1244–1261, 2021.
- Borrow from anywhere: Pseudo multi-modal object detection in thermal imagery. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, pages 0–0, 2019.
- Vehicle detection from multi-modal aerial imagery using YOLOv3 with mid-level fusion. In Big data II: learning, analytics, and applications, volume 11395, pages 22–32. SPIE, 2020.
- Semi-supervised domain adaptation with instance constraints. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 668–675, 2013.
- FA-YOLO: An improved YOLO model for infrared occlusion object detection under confusing background. Wireless Communications and Mobile Computing, 2021:1–10, 2021.
- Coreset-based neural network compression. In Proceedings of the European Conference on Computer Vision (ECCV), pages 454–470, 2018.
- Semantic segmentation with unsupervised domain adaptation under varying weather conditions for autonomous vehicles. IEEE Robotics and Automation Letters, 5(2):3580–3587, 2020.
- FLIR aligned. FLIR Aligned Dataset, 2020. Accessed: August 20, 2022.
- Pedestrian detection in thermal images using saliency maps. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, pages 0–0, 2019.
- Ross Girshick. Fast R-CNN. In Proceedings of the IEEE international conference on computer vision, pages 1440–1448, 2015.
- Rich feature hierarchies for accurate object detection and semantic segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 580–587, 2014.
- Uncertainty-aware unsupervised domain adaptation in object detection. IEEE Transactions on Multimedia, 24:2502–2514, 2021.
- Mask R-CNN. In Proceedings of the IEEE international conference on computer vision, pages 2961–2969, 2017.
- Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770–778, 2016.
- CNN-based thermal infrared person detection by domain adaptation. In Autonomous Systems: Sensors, Vehicles, Security, and the Internet of Everything, volume 10643, pages 38–43. SPIE, 2018.
- Progressive domain adaptation for object detection. In Proceedings of the IEEE/CVF winter conference on applications of computer vision, pages 749–757, 2020.
- LoRA: Low-rank adaptation of large language models. In International Conference on Learning Representations, 2021.
- Exploring low rank training of deep neural networks. arXiv preprint arXiv:2209.13569, 2022.
- Task-conditioned domain adaptation for pedestrian detection in thermal imagery. In European Conference on Computer Vision, pages 546–562. Springer, 2020.
- Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
- Thermal object detection in difficult weather conditions using YOLO. IEEE access, 8:125459–125476, 2020.
- Research of infrared small pedestrian target detection based on YOLOv3. Infrared Technol, 42:176–181, 2020.
- Category dictionary guided unsupervised domain adaptation for object detection. In Proceedings of the AAAI conference on artificial intelligence, volume 35, pages 1949–1957, 2021.
- YOLO-FIRI: Improved YOLOv5 for infrared image object detection. IEEE access, 9:141861–141875, 2021.
- Wei Li. Infrared image pedestrian detection via yolo-v3. In 2021 IEEE 5th Advanced Information Technology, Electronic and Automation Control Conference (IAEAC), volume 5, pages 1052–1055. IEEE, 2021.
- YOLO-ACN: Focusing on small target and occluded object detection. IEEE access, 8:227288–227303, 2020.
- Scene graph generation from objects, phrases and region captions. In Proceedings of the IEEE international conference on computer vision, pages 1261–1270, 2017.
- Microsoft COCO: Common objects in context. In Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part V 13, pages 740–755. Springer, 2014.
- Real-time human detection in thermal infrared imaging at night using enhanced Tiny-yolov3 network. Journal of Real-Time Image Processing, pages 1–14, 2022.
- Joint deep learning for pedestrian detection. In Proceedings of the IEEE international conference on computer vision, pages 2056–2063, 2013.
- You only look once: Unified, real-time object detection. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 779–788, 2016.
- YOLOv3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018.
- Faster R-CNN: Towards real-time object detection with region proposal networks. Advances in neural information processing systems, 28, 2015.
- Domain adaptation for object detection via style consistency. arXiv preprint arXiv:1911.10033, 2019.
- YOLOrs: Object detection in multimodal remote sensing imagery. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 14:1497–1508, 2020.
- Yolors-lite: A lightweight cnn for real-time object detection in remote-sensing. In 2021 IEEE International Geoscience and Remote Sensing Symposium IGARSS, pages 2604–2607. IEEE, 2021.
- Convolutional auto-encoder with tensor-train factorization. In Proceedings of the IEEE/CVF international conference on computer vision, pages 198–206, 2021.
- Infrared dim target detection based on human visual mechanism. Acta Photonica Sinica, 50(1):0110001, 2021.
- Multimodal aerial view object classification with disjoint unimodal feature extraction and fully-connected-layer fusion. In Big Data V: Learning, Analytics, and Applications, volume 12522, page 1252206. SPIE, 2023.
- A multispectral feature fusion network for robust pedestrian detection. Alexandria Engineering Journal, 60(1):73–85, 2021.
- Revisiting unreasonable effectiveness of data in deep learning era. In Proceedings of the IEEE international conference on computer vision, pages 843–852, 2017.
- Sparse R-CNN: End-to-end object detection with learnable proposals. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 14454–14463, 2021.
- DeepID3: Face recognition with very deep neural networks. arXiv preprint arXiv:1502.00873, 2015.
- Teledyne Technologies Incorporated. FLIR ADAS v1 Dataset, 2020. Accessed: August 20, 2022.
- Learning deep embeddings with histogram loss. Advances in neural information processing systems, 29, 2016.
- YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 7464–7475, 2023.
- SSDA3D: Semi-supervised domain adaptation for 3D object detection from point cloud. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 37, pages 2707–2715, 2023.
- Incremental learning based multi-domain adaptation for object detection. Knowledge-Based Systems, 210:106420, 2020.
- Multi-source domain adaptation for object detection. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 3273–3282, 2021.
- Unsupervised domain adaptation for object detection via cross-domain semi-supervised learning. arXiv preprint arXiv:1911.07158, 2019.
- Multi-source unsupervised domain adaptation for object detection. Information Fusion, 78:138–148, 2022.
- FCN-rLSTM: Deep spatio-temporal neural networks for vehicle counting in city cameras. In Proceedings of the IEEE international conference on computer vision, pages 3667–3676, 2017.
- IYOLO: Multi-scale infrared target detection method based on bidirectional feature fusion. In Journal of Physics: Conference Series, volume 1873, page 012020. IOP Publishing, 2021.
- Multi-granularity alignment domain adaptation for object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 9581–9590, 2022.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.