Drone-based RGB-Infrared Cross-Modality Vehicle Detection via Uncertainty-Aware Learning

Published 5 Mar 2020 in cs.CV, cs.LG, and eess.IV | (2003.02437v2)

Abstract: Drone-based vehicle detection aims at finding the vehicle locations and categories in an aerial image. It empowers smart city traffic management and disaster rescue. Researchers have made mount of efforts in this area and achieved considerable progress. Nevertheless, it is still a challenge when the objects are hard to distinguish, especially in low light conditions. To tackle this problem, we construct a large-scale drone-based RGB-Infrared vehicle detection dataset, termed DroneVehicle. Our DroneVehicle collects 28, 439 RGB-Infrared image pairs, covering urban roads, residential areas, parking lots, and other scenarios from day to night. Due to the great gap between RGB and infrared images, cross-modal images provide both effective information and redundant information. To address this dilemma, we further propose an uncertainty-aware cross-modality vehicle detection (UA-CMDet) framework to extract complementary information from cross-modal images, which can significantly improve the detection performance in low light conditions. An uncertainty-aware module (UAM) is designed to quantify the uncertainty weights of each modality, which is calculated by the cross-modal Intersection over Union (IoU) and the RGB illumination value. Furthermore, we design an illumination-aware cross-modal non-maximum suppression algorithm to better integrate the modal-specific information in the inference phase. Extensive experiments on the DroneVehicle dataset demonstrate the flexibility and effectiveness of the proposed method for crossmodality vehicle detection. The dataset can be download from https://github.com/VisDrone/DroneVehicle.

Abstract PDF Upgrade to Chat

Citations (181)

View on Semantic Scholar

Summary

The paper introduces UA-CMDet, a framework that integrates RGB and infrared modalities using an uncertainty-aware module to enhance detection in complex scenes.
It leverages the large-scale DroneVehicle dataset, comprising 28,439 image pairs and 953,087 annotations, to robustly train and evaluate vehicle detection models.
Experimental results show significant improvements in mean average precision over single-modality detectors, demonstrating the efficacy of cross-modality learning.

Drone-based RGB-Infrared Cross-Modality Vehicle Detection via Uncertainty-Aware Learning

Introduction

The paper "Drone-based RGB-Infrared Cross-Modality Vehicle Detection via Uncertainty-Aware Learning" (2003.02437) focuses on improving vehicle detection in aerial images captured by drones. These images frequently suffer from challenges in low-light or complex background conditions. An innovative dataset, DroneVehicle, composed of 28,439 RGB-Infrared image pairs, is constructed to address these challenges. The dataset enables the extraction of complementary information from both RGB and infrared modalities, leveraging an uncertainty-aware cross-modality vehicle detection framework termed UA-CMDet.

Dataset and Motivation

DroneVehicle is a significant contribution, providing extensive coverage across diverse scenarios, including urban roads and parking lots, in both day and night conditions. The inclusion of a large number of annotated images—953,087 across different categories like cars, trucks, buses, vans, and freight cars—facilitates robust model training and evaluation.

Figure 1: Some example annotated images of the DroneVehicle dataset. The first row shows some examples in the RGB modality, and the second row shows the corresponding examples in the infrared modality.

The paper discusses the difficulty of using single-modality datasets, particularly in scenarios with poor lighting, where RGB images can be ineffective, and infrared provides critical complementary information. This led to the creation of DroneVehicle, which uniquely provides RGB-Infrared pairs using cross-modality to harness the strengths of both data forms.

Methodology

The proposed UA-CMDet framework integrates three main branches—two for each modality and one for a fused feature map. The training phase involves predicting classification scores and bounding box coordinates, with the uncertainty-aware module (UAM) computing uncertainty weights based on IoU and illumination values. This arrangement allows for better cross-modality learning and improved detection performance.

Figure 2: The architecture of the proposed UA-CMDet.

The UAM specifically calculates cross-modal IoU and assigns uncertainty weights, serving as a pivotal feature in the architecture that allows the model to handle inaccuracies and misalignments between modalities effectively. The additional IA-NMS step refines detections by accommodating illumination differences, further refining the detector's outputs.

Implementation

The implementation of UA-CMDet involves initialization using a ResNet-FPN backbone, optimized using SGD. Quantifying uncertainties with UAM does not increase inference complexity due to its removal post-training. Experimental results indicate substantial performance gains over single modality detectors, illustrating UA-CMDet's capacity for tackling cross-modal detection challenges in low-light and geometrically complex environments.

Evaluation

The paper provides evidence of UA-CMDet's effectiveness by comparing its performance with various state-of-the-art detectors on the DroneVehicle dataset. It displayed significant improvements in mean average precision (mAP) metrics, with noted superiority in both modality-specific and fused branches.

Figure 3: Visualization of UA-CMDet detection results on DroneVehicle. The first row shows the detection results in the night scenarios. The third row shows the detection results in the daytime scenarios. The second row and the fourth row respectively represent the detection results of the corresponding infrared images.

Conclusion

UA-CMDet demonstrates a robust capability to integrate information from RGB and infrared modalities for improved vehicle detection. The provision of a large-scale RGB-Infrared dataset plays an integral role in this achievement, paving the way for further studies. Future work may involve addressing the dataset's long-tail distribution to enhance model robustness further. The findings underscore the importance of uncertainty quantification in multi-modality learning, marking a step forward in smart city and disaster rescue applications via drone-based aerial imagery.

Markdown Report Issue

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Paper Prompts

Top Community Prompts

Explain it Like I'm 14

off on

Knowledge Gaps

off on

Practical Applications

off on

Glossary

off on

Conceptual Simplification

off on

Explain it Like I'm 14

Overview

This paper is about teaching drones to spot and label vehicles in pictures taken from the sky, both during the day and at night. The key idea is to use two kinds of images together:

RGB images (like normal color photos)
Infrared images (which show heat and work well in the dark)

The authors build a big new dataset and a smart method that knows when to trust one image type more than the other. This helps the drone find cars more accurately in tough situations, like low light or busy backgrounds.

Objectives and Questions

The researchers set out to answer a few simple questions:

Can combining color photos (RGB) with infrared images help find vehicles better, especially at night?
How do we decide which image type to trust more for each scene or each detected object?
How can we handle the fact that the RGB and infrared cameras don’t perfectly line up, causing small position differences?
Can we build a large, real-world dataset to train and test these ideas?

Methods and Approach

Think of the approach like a team effort with a smart coach:

The “day teammate” is the RGB camera: great in good light, but struggles in the dark.
The “night teammate” is the infrared camera: sees heat, so it’s strong at night, but sometimes mistakes warm objects or patterns for vehicles during the day.
The “coach” is an uncertainty-aware system that decides, for each object and each scene, which teammate to trust more.

Here’s how it works in everyday terms:

Building the dataset (DroneVehicle):
- 28,439 pairs of matching RGB and infrared images (56,878 images total).
- 953,087 labeled vehicles across five types: car, truck, bus, van, and freight car.
- Collected over many places (roads, residential areas, parking lots) and times (day, night, very dark night).
- Uses rotated rectangles (oriented bounding boxes) to match vehicles seen at angles, not just straight-on boxes.
Measuring “uncertainty” with simple signals:
- Overlap check (IoU): Imagine drawing a box around a car in the RGB image and another box in the infrared image. If the boxes overlap a lot, they agree; if they barely overlap, they disagree. This overlap score helps measure alignment and confidence.
- Brightness check: The average brightness of the RGB image tells if it’s daytime, night, or very dark. If it’s dark, the system gives less weight to RGB and more to infrared.
Three-branch detector (UA-CMDet):
- RGB branch: learns to detect vehicles from RGB images.
- Infrared branch: learns from infrared images.
- Fusion branch: combines features from both to make a joint prediction.
- A small “uncertainty-aware module” gives each branch a weight (how much to trust it) based on overlap and brightness. During training, boxes that are uncertain contribute less to the learning, so the model doesn’t get confused.
Smarter final decision (Illumination-Aware NMS):
- Detectors often produce multiple overlapping boxes for the same car. Non-Maximum Suppression (NMS) keeps the best ones and removes duplicates.
- The paper’s version, IA-NMS, lowers the influence of the RGB branch when it’s dark, so nighttime mistakes from the RGB camera don’t mess up the final result.

Main Findings and Why They Matter

Here are the most important takeaways from the experiments on the new dataset:

Combining RGB and infrared beats using either one alone, especially in low-light scenes.
The uncertainty-aware “coach” helps the system know when to trust which camera more, reducing mistakes like:
- Missing vehicles at night in RGB images.
- False “ghost” vehicles in infrared images caused by heat reflections or lookalike shapes.
The new dataset is large and diverse, covering day, night, and dark night, multiple heights and angles, and five vehicle types—this makes the model more robust in real-world drone scenarios.
The full system (UA-CMDet) improves accuracy over standard baselines and over simple fusion methods without uncertainty handling. In short: smarter fusion and trust weighting leads to better results.

Implications and Impact

This research can help cities and emergency teams:

Monitor traffic more reliably, day and night.
Improve safety by accurately detecting vehicles in low visibility (e.g., dark roads, power outages).
Support disaster response by finding vehicles quickly in difficult conditions.

Beyond vehicles, the idea of “uncertainty-aware” fusion is useful anywhere you combine different kinds of sensors (like cameras, thermal cameras, radar, etc.). The dataset is publicly available, so other researchers can build on this work to make aerial detection smarter and more dependable.

View Paper Prompt View All Prompts

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Generate Now

Continue Learning

Authors (4)

Collections

GitHub

GitHub - VisDrone/DroneVehicle: Drone-based RGB-Infrared Cross-Modality Vehicle Detection via Uncertainty-Aware Learning (409 stars)

Drone-based RGB-Infrared Cross-Modality Vehicle Detection via Uncertainty-Aware Learning

Summary

Drone-based RGB-Infrared Cross-Modality Vehicle Detection via Uncertainty-Aware Learning

Introduction

Dataset and Motivation

Methodology

Implementation

Evaluation

Conclusion

Paper to Video (Beta)

Whiteboard

Paper Prompts

Top Community Prompts

Explain it Like I'm 14

Overview

Objectives and Questions

Methods and Approach

Main Findings and Why They Matter

Implications and Impact

Open Problems

Continue Learning

Related Papers

Authors (4)

Collections

GitHub

Tweets