Transformation-Invariant Network for Few-Shot Object Detection in Remote Sensing Images

Published 13 Mar 2023 in cs.CV | (2303.06817v3)

Abstract: Object detection in remote sensing images relies on a large amount of labeled data for training. However, the increasing number of new categories and class imbalance make exhaustive annotation impractical. Few-shot object detection (FSOD) addresses this issue by leveraging meta-learning on seen base classes and fine-tuning on novel classes with limited labeled samples. Nonetheless, the substantial scale and orientation variations of objects in remote sensing images pose significant challenges to existing few-shot object detection methods. To overcome these challenges, we propose integrating a feature pyramid network and utilizing prototype features to enhance query features, thereby improving existing FSOD methods. We refer to this modified FSOD approach as a Strong Baseline, which has demonstrated significant performance improvements compared to the original baselines. Furthermore, we tackle the issue of spatial misalignment caused by orientation variations between the query and support images by introducing a Transformation-Invariant Network (TINet). TINet ensures geometric invariance and explicitly aligns the features of the query and support branches, resulting in additional performance gains while maintaining the same inference speed as the Strong Baseline. Extensive experiments on three widely used remote sensing object detection datasets, i.e., NWPU VHR-10.v2, DIOR, and HRRSD demonstrated the effectiveness of the proposed method.

Abstract PDF Upgrade to Chat

Citations (15)

View on Semantic Scholar

Summary

The paper introduces TINet, which employs transformation invariance and a strong baseline with FPN and depth-wise convolution for improved few-shot object detection in remote sensing images.
It demonstrates significant performance gains in average precision and robustness across challenging datasets like DIOR, NWPU VHR-10.v2, and HRRSD.
Extensive ablation studies validate the effectiveness of various transformation methods and consistency regularizations, paving the way for more reliable detection outcomes.

Transformation-Invariant Network for Few-Shot Object Detection in Remote Sensing Images

Introduction

Few-shot object detection (FSOD) in remote sensing images (RSIs) faces significant challenges due to the large-scale and orientation variations of objects. Traditional object detection methods require extensive annotated data which is often impractical for RSIs given their diverse categories and class imbalances. This research paper proposes a novel FSOD approach leveraging transformation invariance to address these issues effectively.

Strong Baseline Approach

The paper introduces a modified FSOD method termed as 'Strong Baseline' which integrates a Feature Pyramid Network (FPN) to adapt to scale variations effectively. This approach improves upon existing FSOD methods by using prototype features to enhance query features, thereby facilitating better adaptability to RSIs.

Key Components of the Strong Baseline:

Feature Pyramid Network (FPN): Enables multi-scale feature extraction, crucial for handling scale variations prevalent in RSIs.
Depth-wise Convolution: Enhances the interaction between query features and support features, promoting better adaptation to diverse object scales.
Increased IoU Threshold: Adjusts the non-maximum suppression IoU threshold from 0.7 to 0.9 in the region proposal network, preventing removal of bounding boxes due to mistakes in novel categories.

Figure 1: (a) Number of object instances per class in the DIOR dataset. (b-e) Comparison of the detection results of the Strong Baseline and TINet.

Transformation-Invariant Network (TINet)

The TINet is introduced to handle spatial misalignments caused by orientation variations. This network applies transformations to the query image to ensure geometric invariance, achieving consistent bounding box predictions across varying poses.

TINet Architecture:

Transformation of Inputs: The network processes both the original query image and a transformed version to enforce consistency in predictions.
Consistency Loss: Incorporates both classification and regression consistency losses, ensuring alignment of spatial features between query and support branches.
Depth-wise Convolution: Used for effective feature aggregation, making TINet robust against orientation perturbations.

Experimental Evaluation

Extensive experiments were conducted on the NWPU VHR-10.v2, DIOR, and HRRSD datasets. The TINet demonstrated state-of-the-art performance in few-shot object detection across all datasets.

Key Findings:

Performance Gains: TINet achieved remarkable improvements in average precision (AP) metrics compared to existing FSOD methods, especially under challenging conditions with substantial orientation variations.
Robustness: TINet consistently outperformed other methods, showcasing superior adaptability to varying object scales and orientations.

Figure 2: Comparison of confusion matrix between the Strong Baseline and TINet. (a) Confusion matrix of the Strong Baseline. (b) Confusion matrix of TINet.

Ablation Studies

The effectiveness of various components was validated through ablation studies. Alternative transformations and consistency regularizations were examined to highlight their impact on performance.

Transformation Methods: Diagonal flipping was found to be more effective than horizontal or vertical flips due to less distortion of object appearance.
Consistency Regularization: $L_2$ loss outperformed JSD and KLD regularizations in enforcing prediction consistency, particularly beneficial in low-shot settings.

Conclusion

The proposed TINet effectively addresses the limitations of conventional FSOD approaches in RSIs by offering enhanced feature alignment through transformation invariance. The network's architecture ensures robust performance even in complex scenarios with significant scale and orientation variations. Future work may explore additional geometric transformations to further improve detection outcomes.

By advancing FSOD in RSIs, the TINet paves the way for more efficient and accurate remote sensing data analysis, crucial for applications like environmental monitoring and resource surveys.