Papers
Topics
Authors
Recent
Search
2000 character limit reached

ARS-DETR: Aspect Ratio-Sensitive Detection Transformer for Aerial Oriented Object Detection

Published 9 Mar 2023 in cs.CV and cs.AI | (2303.04989v3)

Abstract: Existing oriented object detection methods commonly use metric AP${50}$ to measure the performance of the model. We argue that AP${50}$ is inherently unsuitable for oriented object detection due to its large tolerance in angle deviation. Therefore, we advocate using high-precision metric, e.g. AP$_{75}$, to measure the performance of models. In this paper, we propose an Aspect Ratio Sensitive Oriented Object Detector with Transformer, termed ARS-DETR, which exhibits a competitive performance in high-precision oriented object detection. Specifically, a new angle classification method, calling Aspect Ratio aware Circle Smooth Label (AR-CSL), is proposed to smooth the angle label in a more reasonable way and discard the hyperparameter that introduced by previous work (e.g. CSL). Then, a rotated deformable attention module is designed to rotate the sampling points with the corresponding angles and eliminate the misalignment between region features and sampling points. Moreover, a dynamic weight coefficient according to the aspect ratio is adopted to calculate the angle loss. Comprehensive experiments on several challenging datasets show that our method achieves competitive performance on the high-precision oriented object detection task.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (48)
  1. Z. Liu, L. Yuan, L. Weng, and Y. Yang, “A high resolution optical satellite image dataset for ship recognition and some new baselines.” in ICPRAM, 2017, pp. 324–331.
  2. G.-S. Xia, X. Bai, J. Ding, Z. Zhu, S. Belongie, J. Luo, M. Datcu, M. Pelillo, and L. Zhang, “Dota: A large-scale dataset for object detection in aerial images,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2018, pp. 3974–3983.
  3. G. Cheng, J. Wang, K. Li, X. Xie, C. Lang, Y. Yao, and J. Han, “Anchor-free oriented proposal generator for object detection,” IEEE Transactions on Geoscience and Remote Sensing, vol. 60, pp. 1–11, 2022.
  4. J. Ding, N. Xue, Y. Long, G.-S. Xia, and Q. Lu, “Learning roi transformer for oriented object detection in aerial images,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019, pp. 2849–2858.
  5. X. Yang, J. Yang, J. Yan, Y. Zhang, T. Zhang, Z. Guo, X. Sun, and K. Fu, “Scrdet: Towards more robust detection for small, cluttered and rotated objects,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2019, pp. 8232–8241.
  6. Y. Xu, M. Fu, Q. Wang, Y. Wang, K. Chen, G.-S. Xia, and X. Bai, “Gliding vertex on the horizontal bounding box for multi-oriented object detection,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 43, no. 4, pp. 1452–1459, 2020.
  7. W. Li, Y. Chen, K. Hu, and J. Zhu, “Oriented reppoints for aerial object detection,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 1829–1838.
  8. X. Yang and J. Yan, “Arbitrary-oriented object detection with circular smooth label,” in European Conference on Computer Vision, 2020, pp. 677–694.
  9. X. Yang, L. Hou, Y. Zhou, W. Wang, and J. Yan, “Dense label encoding for boundary discontinuity free rotation detection,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 15 819–15 829.
  10. J. Wang, F. Li, and H. Bi, “Gaussian focal loss: Learning distribution polarized angle prediction for rotated object detection in aerial images,” IEEE Transactions on Geoscience and Remote Sensing, vol. 60, pp. 1–13, 2022.
  11. X. Yang, J. Yan, Q. Ming, W. Wang, X. Zhang, and Q. Tian, “Rethinking rotated object detection with gaussian wasserstein distance loss,” in International Conference on Machine Learning.   PMLR, 2021, pp. 11 830–11 841.
  12. Q. Ming, L. Miao, Z. Zhou, J. Song, Y. Dong, and X. Yang, “Task interleaving and orientation estimation for high-precision oriented object detection in aerial images,” ISPRS Journal of Photogrammetry and Remote Sensing, vol. 196, pp. 241–255, 2023.
  13. N. Carion, F. Massa, G. Synnaeve, N. Usunier, A. Kirillov, and S. Zagoruyko, “End-to-end object detection with transformers,” in European conference on computer vision.   Springer, 2020, pp. 213–229.
  14. X. Zhu, W. Su, L. Lu, B. Li, X. Wang, and J. Dai, “Deformable detr: Deformable transformers for end-to-end object detection,” in International Conference on Learning Representations, 2021.
  15. S. Ren, K. He, R. Girshick, and J. Sun, “Faster r-cnn: Towards real-time object detection with region proposal networks,” in Advances in Neural Information Processing Systems, 2015, pp. 91–99.
  16. D. Meng, X. Chen, Z. Fan, G. Zeng, H. Li, Y. Yuan, L. Sun, and J. Wang, “Conditional detr for fast training convergence,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021, pp. 3651–3660.
  17. F. Li, H. Zhang, S. Liu, J. Guo, L. M. Ni, and L. Zhang, “Dn-detr: Accelerate detr training by introducing query denoising,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 13 619–13 627.
  18. H. Zhang, F. Li, S. Liu, L. Zhang, H. Su, J. Zhu, L. M. Ni, and H.-Y. Shum, “Dino: Detr with improved denoising anchor boxes for end-to-end object detection,” in International Conference on Learning Representations, 2023.
  19. Y. Wang, X. Zhang, T. Yang, and J. Sun, “Anchor detr: Query design for transformer-based detector,” in Proceedings of the AAAI conference on artificial intelligence, vol. 36, no. 3, 2022, pp. 2567–2575.
  20. S. Liu, F. Li, H. Zhang, X. Yang, X. Qi, H. Su, J. Zhu, and L. Zhang, “Dab-detr: Dynamic anchor boxes are better queries for detr,” arXiv preprint arXiv:2201.12329, 2022.
  21. T. Ma, M. Mao, H. Zheng, P. Gao, X. Wang, S. Han, E. Ding, B. Zhang, and D. Doermann, “Oriented object detection with transformer,” arXiv preprint arXiv:2106.03146, 2021.
  22. L. Dai, H. Liu, H. Tang, Z. Wu, and P. Song, “Ao2-detr: Arbitrary-oriented object detection transformer,” IEEE Transactions on Circuits and Systems for Video Technology, 2022.
  23. J. Ma, W. Shao, H. Ye, L. Wang, H. Wang, Y. Zheng, and X. Xue, “Arbitrary-oriented scene text detection via rotation proposals,” IEEE Transactions on Multimedia, vol. 20, no. 11, pp. 3111–3122, 2018.
  24. X. Yang, J. Yan, Z. Feng, and T. He, “R3det: Refined single-stage detector with feature refinement for rotating object,” in AAAI Conference on Artificial Intelligence, vol. 35, no. 4, 2021, pp. 3163–3171.
  25. J. Han, J. Ding, J. Li, and G.-S. Xia, “Align deep features for oriented object detection,” IEEE Transactions on Geoscience and Remote Sensing, vol. 60, pp. 1–11, 2021.
  26. X. Yang, X. Yang, J. Yang, Q. Ming, W. Wang, Q. Tian, and J. Yan, “Learning high-precision bounding box for rotated object detection via kullback-leibler divergence,” Advances in Neural Information Processing Systems, vol. 34, pp. 18 381–18 394, 2021.
  27. X. Yang, Y. Zhou, G. Zhang, J. Yang, W. Wang, J. Yan, X. Zhang, and Q. Tian, “The kfiou loss for rotated object detection,” in International Conference on Learning Representations, 2023.
  28. Y. Yu and F. Da, “Phase-shifting coder: Predicting accurate orientation in oriented object detection,” in Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023. [Online]. Available: https://arxiv.org/abs/2211.06368
  29. H. Wang, Z. Huang, Z. Chen, Y. Song, and W. Li, “Multigrained angle representation for remote-sensing object detection,” IEEE Transactions on Geoscience and Remote Sensing, vol. 60, pp. 1–13, 2022.
  30. Z. Xiao, B. Xu, Y. Zhang, K. Wang, Q. Wan, and X. Tan, “Aspect ratio-based bidirectional label encoding for square-like rotation detection,” IEEE Geoscience and Remote Sensing Letters, 2023.
  31. X. Sun, P. Wang, Z. Yan, F. Xu, R. Wang, W. Diao, J. Chen, J. Li, Y. Feng, T. Xu et al., “Fair1m: A benchmark dataset for fine-grained object recognition in high-resolution remote sensing imagery,” ISPRS Journal of Photogrammetry and Remote Sensing, vol. 184, pp. 116–130, 2022.
  32. X. Yang, H. Sun, X. Sun, M. Yan, Z. Guo, and K. Fu, “Position detection and direction prediction for arbitrary-oriented ships via multitask rotation region convolutional neural network,” IEEE Access, vol. 6, pp. 50 839–50 849, 2018.
  33. X. Yang and J. Yan, “On the arbitrary-oriented object detection: Classification based approaches revisited,” International Journal of Computer Vision, vol. 130, no. 5, pp. 1340–1365, 2022.
  34. Z. Tian, C. Shen, H. Chen, and T. He, “Fcos: Fully convolutional one-stage object detection,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2019, pp. 9627–9636.
  35. S. Zhang, C. Chi, Y. Yao, Z. Lei, and S. Z. Li, “Bridging the gap between anchor-based and anchor-free detection via adaptive training sample selection,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2020, pp. 9759–9768.
  36. T.-Y. Lin, P. Dollár, R. Girshick, K. He, B. Hariharan, and S. Belongie, “Feature pyramid networks for object detection,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 2117–2125.
  37. J. Dai, H. Qi, Y. Xiong, Y. Li, G. Zhang, H. Hu, and Y. Wei, “Deformable convolutional networks,” in Proceedings of the IEEE international conference on computer vision, 2017, pp. 764–773.
  38. K. Li, G. Wan, G. Cheng, L. Meng, and J. Han, “Object detection in optical remote sensing images: A survey and a new benchmark,” ISPRS Journal of Photogrammetry and Remote Sensing, vol. 159, pp. 296–307, 2020.
  39. A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan, T. Killeen, Z. Lin, N. Gimelshein, L. Antiga et al., “Pytorch: An imperative style, high-performance deep learning library,” in Advances in neural information processing systems, 2019.
  40. Y. Zhou, X. Yang, G. Zhang, J. Wang, Y. Liu, L. Hou, X. Jiang, X. Liu, J. Yan, C. Lyu et al., “Mmrotate: A rotated object detection benchmark using pytorch,” in Proceedings of the 30th ACM International Conference on Multimedia, 2022, pp. 7331–7334.
  41. I. Loshchilov and F. Hutter, “Decoupled weight decay regularization,” in International Conference on Learning Representations, 2018.
  42. K. He, G. Gkioxari, P. Dollár, and R. Girshick, “Mask r-cnn,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2017, pp. 2961–2969.
  43. Z. Liu, Y. Lin, Y. Cao, H. Hu, Y. Wei, Z. Zhang, S. Lin, and B. Guo, “Swin transformer: Hierarchical vision transformer using shifted windows,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021, pp. 10 012–10 022.
  44. T.-Y. Lin, P. Goyal, R. Girshick, K. He, and P. Dollár, “Focal loss for dense object detection,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2017, pp. 2980–2988.
  45. X. Yang, G. Zhang, W. Li, X. Wang, Y. Zhou, and J. Yan, “H2rbox: Horizontal box annotation is all you need for oriented object detection,” in International Conference on Learning Representations, 2023.
  46. L. Hou, K. Lu, J. Xue, and Y. Li, “Shape-adaptive selection and measurement for oriented object detection,” in Proceedings of the AAAI Conference on Artificial Intelligence, 2022, pp. 923–932.
  47. Z. Guo, C. Liu, X. Zhang, J. Jiao, X. Ji, and Q. Ye, “Beyond bounding-box: Convex-hull feature adaptation for oriented and densely packed object detection,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 8792–8801.
  48. Y. Jiang, X. Zhu, X. Wang, S. Yang, W. Li, H. Wang, P. Fu, and Z. Luo, “R2cnn: rotational region cnn for orientation robust scene text detection,” arXiv preprint arXiv:1706.09579, 2017.
Citations (29)

Summary

  • The paper introduces ARS-DETR with AR-CSL, which dynamically adjusts angle classification smoothing based on object aspect ratios.
  • The paper incorporates a rotated deformable attention module that fine-tunes sampling points to align with angled objects for enhanced precision.
  • The paper demonstrates superior performance on datasets like DOTA-v1.0, achieving higher AP75 metrics compared to baseline detectors.

An Analysis of "ARS-DETR: Aspect Ratio-Sensitive Detection Transformer for Aerial Oriented Object Detection"

The paper "ARS-DETR: Aspect Ratio-Sensitive Detection Transformer for Aerial Oriented Object Detection" focuses on advancing the task of detecting oriented objects within aerial images, which remains a challenging field due to the necessity for high precision. The authors propose a novel model, ARS-DETR, which leverages a transformer-based architecture specifically refined for oriented object detection.

Key Contributions

  1. Aspect Ratio Aware Circle Smooth Label (AR-CSL): The authors introduce an innovative angle classification method termed AR-CSL. Unlike traditional methods that apply a uniform smoothing approach to angle classification, AR-CSL dynamically adjusts the smoothing according to an object’s aspect ratio, acknowledging that objects with different aspect ratios have varying sensitivities to angle deviations.
  2. Rotated Deformable Attention Module (RDA): To address misalignments of sampling points with object regions, the paper presents the RDA module, which incorporates angle information into the attention mechanism, ensuring sampling points align accurately with angled objects.
  3. Aspect Ratio Sensitive Matching and Loss (ARM and ARL): These components are designed to adaptively weigh the influence of angles during the training and matching processes. By dynamically adjusting this focus based on an object's aspect ratio, ARS-DETR can achieve high precision in angle prediction.
  4. Denoising Training Strategy: A denoising strategy is implemented to stabilize the training process by introducing noisy ground truth data, which is beneficial for convergence and model robustness.

Empirical Results

The experiments conducted on several challenging datasets like DOTA-v1.0, DIOR-R, and OHD-SJTU indicate that ARS-DETR achieves competitive performance, especially in high-precision oriented object detection. The model demonstrates improved results in terms of AP75_{75}, a metric that places greater demands on detection accuracy, compared to baseline and contemporary methods.

Detailed Evaluation

  • Performance Metrics: AP50_{50} and AP75_{75} were used as evaluation metrics, with the latter being emphasized for high-precision tasks. The paper critiques the reliance on AP50_{50} for failing to reflect the nuanced differences in angle precision required for many applications.
  • Comparison to Baselines: ARS-DETR was shown to outperform several state-of-the-art oriented object detectors. Notably, this paper highlights instances where models that perform well under AP50_{50} do not maintain their lead under the more stringent AP75_{75}, underscoring the relevance of the new approaches introduced.

Implications and Future Directions

The implications of these innovations are significant for applications in remote sensing where precision in object orientation can be crucial, such as in military and urban planning contexts. The approach taken by ARS-DETR can inspire further improvements in how angle information is used and optimized in transformer-based object detection models.

For future developments, this research opens up avenues for exploring how transformer architectures can be further adapted for varying tasks in computer vision beyond the aerial domain. More broadly, the integration of geometric and morphological information, like aspect ratios, into end-to-end models presents a rich field for exploration.

Conclusion

The ARS-DETR proposes substantial methodological advancements that enhance the precision of detecting oriented objects in aerial imagery. By incorporating aspect ratio-sensitive mechanisms within a transformer-based framework, it sets a new benchmark for precision in this domain. As AI continues to evolve, choosing such meticulous and responsive methods is paramount for applications demanding high accuracy and reliability.

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Collections

Sign up for free to add this paper to one or more collections.