Papers
Topics
Authors
Recent
Search
2000 character limit reached

UDTIRI: An Online Open-Source Intelligent Road Inspection Benchmark Suite

Published 18 Apr 2023 in cs.CV, cs.AI, cs.LG, and cs.RO | (2304.08842v3)

Abstract: In the nascent domain of urban digital twins (UDT), the prospects for leveraging cutting-edge deep learning techniques are vast and compelling. Particularly within the specialized area of intelligent road inspection (IRI), a noticeable gap exists, underscored by the current dearth of dedicated research efforts and the lack of large-scale well-annotated datasets. To foster advancements in this burgeoning field, we have launched an online open-source benchmark suite, referred to as UDTIRI. Along with this article, we introduce the road pothole detection task, the first online competition published within this benchmark suite. This task provides a well-annotated dataset, comprising 1,000 RGB images and their pixel/instance-level ground-truth annotations, captured in diverse real-world scenarios under different illumination and weather conditions. Our benchmark provides a systematic and thorough evaluation of state-of-the-art object detection, semantic segmentation, and instance segmentation networks, developed based on either convolutional neural networks or Transformers. We anticipate that our benchmark will serve as a catalyst for the integration of advanced UDT techniques into IRI. By providing algorithms with a more comprehensive understanding of diverse road conditions, we seek to unlock their untapped potential and foster innovation in this critical domain.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (95)
  1. Y. Cheng et al., “Cyber-physical integration for moving digital factories forward towards smart manufacturing: a survey,” The International Journal of Advanced Manufacturing Technology, vol. 97, no. 1, pp. 1209–1221, 2018.
  2. S. Guo et al., “Digital transformation for intelligent road condition assessment,” in Intelligent Systems in Digital Transformation.   Springer, 2023, pp. 511–533.
  3. Y. H. Son et al., “Past, present, and future research of digital twin for smart manufacturing,” Journal of Computational Design and Engineering, vol. 9, no. 1, pp. 1–23, 2022.
  4. H. Xia et al., “Study on city digital twin technologies for sustainable smart city design: A review and bibliometric analysis of geographic information system and building information modeling integration,” Sustainable Cities and Society, vol. 84, p. 104009, 2022.
  5. M. Alazab et al., “Digital twins for healthcare 4.0-recent advances, architecture, and open challenges,” IEEE Consumer Electronics Magazine, 2022.
  6. B. Lei et al., “Challenges of urban digital twins: A systematic review and a delphi expert survey,” Automation in Construction, vol. 147, p. 104716, 2023.
  7. R. Fan et al., “Urban digital twins for intelligent road inspection,” in 2022 IEEE International Conference on Big Data (Big Data).   IEEE, 2022, pp. 5110–5114.
  8. R. Fan et al., “Autonomous driving perception,” 2023.
  9. J. Li et al., “RoadFormer: Duplex transformer for RGB-normal semantic road scene parsing,” arXiv preprint arXiv:2309.10356, 2023.
  10. S. Mathavan et al., “A review of three-dimensional imaging technologies for pavement distress detection and measurements,” IEEE Transactions on Intelligent Transportation Systems (TITS), vol. 16, no. 5, pp. 2353–2362, 2015.
  11. N. Ma et al., “Computer vision for road imaging and pothole detection: a state-of-the-art review of systems and algorithms,” Transportation Safety and Environment, vol. 4, no. 4, 2022, tdac026.
  12. R. Fan et al., “Road surface 3D reconstruction based on dense subpixel disparity map estimation,” IEEE Transactions on Image Processing, vol. 27, no. 6, pp. 3025–3035, 2018.
  13. R. Fan et al., “Pothole detection based on disparity transformation and road surface modeling,” IEEE Transactions on Image Processing, vol. 29, pp. 897–908, 2020.
  14. R. Fan et al., “Rethinking road surface 3-D reconstruction and pothole detection: From perspective transformation to disparity map segmentation,” IEEE Transactions on Cybernetics, vol. 52, no. 7, pp. 5799–5808, 2022.
  15. R. Fan et al., “We learn better road pothole detection: from attention aggregation to adversarial domain adaptation,” in Proceedings of the European Conference on Computer Vision Workshops (ECCVW).   Springer, 2020, pp. 285–300.
  16. C. Koch et al., “A review on computer vision based defect detection and condition assessment of concrete and asphalt civil infrastructure,” Advanced Engineering Informatics, vol. 29, no. 2, pp. 196–210, 2015.
  17. M. R. Jahanshahi et al., “Unsupervised approach for autonomous pavement-defect detection and quantification using an inexpensive depth sensor,” Journal of Computing in Civil Engineering, vol. 27, no. 6, pp. 743–754, 2013.
  18. Z. Zhang, “Advanced stereo vision disparity calculation and obstacle analysis for intelligent vehicles,” Ph.D. dissertation, University of Bristol, 2013.
  19. R. Fan et al., “Road damage detection based on unsupervised disparity map segmentation,” IEEE Transactions on Intelligent Transportation Systems, vol. 21, no. 11, pp. 4906–4911, 2019.
  20. A. Dhiman and R. Klette, “Pothole detection using computer vision and learning,” IEEE Transactions on Intelligent Transportation Systems, vol. 21, no. 8, pp. 3536–3550, 2019.
  21. R. Fan et al., “Graph attention layer evolves semantic segmentation for road pothole detection: A benchmark and algorithms,” IEEE Transactions on Image Processing, vol. 30, pp. 8144–8154, 2021.
  22. A. Geiger et al., “Are we ready for autonomous driving? the kitti vision benchmark suite,” in 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).   IEEE, 2012, pp. 3354–3361.
  23. M. Cordts et al., “The cityscapes dataset for semantic urban scene understanding,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016, pp. 3213–3223.
  24. Girshick et al., “Rich feature hierarchies for accurate object detection and semantic segmentation,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2014, pp. 580–587.
  25. R. Girshick, “Fast R-CNN,” in Proceedings of the IEEE International Conference on Computer Vision (ICCV), December 2015, pp. 1440–1448.
  26. S. Ren et al., “Faster R-CNN: Towards real-time object detection with region proposal networks,” Advances in Neural Information Processing Systems (NIPS), vol. 28, pp. 91–99, 2015.
  27. W. Liu et al., “SSD: Single shot multibox detector,” in Proceedings of the European Conference on Computer Vision (ECCV).   Springer, 2016, pp. 21–37.
  28. X. Zhou et al., “Objects as points,” Computing Research Repository (CoRR), vol. abs/1904.07850, 2019. [Online]. Available: https://arxiv.org/abs/1904.07850
  29. T.-Y. Lin et al., “Focal loss for dense object detection,” in Proceedings of the IEEE International Conference on Computer Vision (ICCV), 2017, pp. 2980–2988.
  30. M. Tan et al., “EfficientDet: Scalable and efficient object detection,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2020, pp. 10 778–10 787.
  31. J. Redmon et al., “You only look once: Unified, real-time object detection,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016, pp. 779–788.
  32. C. Li et al., “YOLOv6: A single-stage object detection framework for industrial applications,” Computing Research Repository (CoRR), vol. abs/2209.02976, 2022. [Online]. Available: https://arxiv.org/abs/2209.02976
  33. Z. Gevorgyan, “SIoU loss: More powerful learning for bounding box regression,” Computing Research Repository (CoRR), vol. abs/2205.12740, 2022. [Online]. Available: https://arxiv.org/abs/2205.12740
  34. J. Redmon et al., “YOLO9000: better, faster, stronger,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017, pp. 7263–7271.
  35. J. Redmon et al., “YOLOv3: An incremental improvement,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018, pp. 6517–6522.
  36. A. Bochkovskiy et al., “YOLOv4: optimal speed and accuracy of object detection,” Computing Research Repository (CoRR), vol. abs/2004.10934, 2020. [Online]. Available: https://arxiv.org/abs/2004.10934
  37. C. Wang et al., “CSPNet: A new backbone that can enhance learning capability of cnn,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Wrkshops, 2020, pp. 390–391.
  38. G. Jocher et al., “ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation,” Zenodo, 2022.
  39. C.-Y. Wang et al., “YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors,” Computing Research Repository (CoRR), vol. abs/2207.02696, 2022. [Online]. Available: https://arxiv.org/abs/2207.02696
  40. N. Carion et al., “End-to-end object detection with transformers,” in Proceedings of the European Conference on Computer Vision (ECCV).   Springer, 2020, pp. 213–229.
  41. X. Zhu et al., “Deformable DETR: Deformable Transformers for End-to-End Object Detection,” in International Conference on Learning Representations (ICLR), 2020.
  42. J. Long et al., “Fully convolutional networks for semantic segmentation,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015, pp. 3431–3440.
  43. H. Wu et al., “FastfCN: Rethinking dilated convolution in the backbone for semantic segmentation,” Computing Research Repository (CoRR), vol. abs/1903.11816, 2019. [Online]. Available: https://arxiv.org/abs/1903.11816
  44. A. Kirillov et al., “Panoptic feature pyramid networks,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2019, pp. 6399–6408.
  45. T. Xiao et al., “Unified perceptual parsing for scene understanding,” in Proceedings of the European Conference on Computer Vision (ECCV), 2018, pp. 418–434.
  46. L.-C. Florian et al., “Rethinking atrous convolution for semantic image segmentation,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2017.
  47. L.-C. Chen et al., “Encoder-decoder with atrous separable convolution for semantic image segmentation,” in Proceedings of the European Conference on Computer Vision (ECCV), 2018, pp. 801–818.
  48. J. He et al., “Dynamic multi-scale filters for semantic segmentation,” in Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2019, pp. 3562–3572.
  49. H. Zhao et al., “Pyramid scene parsing network,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017, pp. 2881–2890.
  50. K. He et al., “Mask R-CNN,” in Proceedings of the IEEE International Conference on Computer Vision (ICCV), 2017, pp. 2961–2969.
  51. T.-Y. Lin et al., “Feature pyramid networks for object detection,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017, pp. 2117–2125.
  52. O. Ronneberger et al., “U-Net: Convolutional networks for biomedical image segmentation,” in International Conference on Medical Image Computing and Computer-assisted Intervention (MICCAI).   Springer, 2015, pp. 234–241.
  53. A. Chaurasia et al., “LinkNet: Exploiting encoder representations for efficient semantic segmentation,” in 2017 IEEE Visual Communications and Image Processing (VCIP).   IEEE, 2017, pp. 1–4.
  54. V. Badrinarayanan et al., “SegNet: A deep convolutional encoder-decoder architecture for image segmentation,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 39, no. 12, pp. 2481–2495, 2017.
  55. A. Paszke et al., “ENet: A deep neural network architecture for real-time semantic segmentation,” Computing Research Repository (CoRR), vol. abs/1606.02147, 2016. [Online]. Available: https://arxiv.org/abs/1606.02147
  56. A. Myronenko, “3D MRI brain tumor segmentation using autoencoder regularization,” in International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI) Brainlesion Workshop.   Springer, 2018, pp. 311–320.
  57. K. Sun et al., “Deep high-resolution representation learning for human pose estimation,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2019, pp. 5693–5703.
  58. N. Jia et al., “TFGNet: Traffic salient object detection using a feature deep interaction and guidance fusion,” IEEE Transactions on Intelligent Transportation Systems, 2023, DOI: 10.1109/TITS.2023.3293822.
  59. N. Jia et al., “Enhancing IIoT vision data transmission and processing via spatial difference attention-guided saliency detection,” IEEE Internet of Things Journal, 2023, DOI: 10.1109/JIOT.2023.3343758.
  60. X. Wang et al., “Non-local neural networks,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018, pp. 7794–7803.
  61. H. Zhao et al., “PSANet: Point-wise spatial attention network for scene parsing,” in Proceedings of the European Conference on Computer Vision (ECCV), 2018, pp. 267–283.
  62. Z. Zhu et al., “Asymmetric Non-Local neural networks for semantic segmentation,” in Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), October 2019, pp. 593–602.
  63. M. Yin et al., “Disentangled non-local neural networks,” in Proceedings of the European Conference on Computer Vision (ECCV).   Springer, 2020, pp. 191–207.
  64. J. Fu et al., “Dual attention network for scene segmentation,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2019, pp. 3146–3154.
  65. J. He et al., “Adaptive pyramid context network for semantic segmentation,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2019, pp. 7519–7528.
  66. Z. Huang et al., “CCNet: Criss-cross attention for semantic segmentation,” in Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2019, pp. 603–612.
  67. L. Huang et al., “Interlaced sparse self-attention for semantic segmentation,” Memory (MB), 2000.
  68. X. Li et al., “Expectation-maximization attention networks for semantic segmentation,” in Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2019, pp. 9167–9176.
  69. H. Zhang et al., “Context encoding for semantic segmentation,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018, pp. 7151–7160.
  70. A. Dosovitskiy et al., “An image is worth 16x16 words: Transformers for image recognition at scale,” in International Conference on Learning Representations (ICLR), 2020.
  71. Z. Liu et al., “Swin Transformer: Hierarchical vision transformer using shifted windows,” in Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2021, pp. 10 012–10 022.
  72. R. Strudel et al., “Segmenter: Transformer for semantic segmentation,” in Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2021, pp. 7262–7272.
  73. X. Chu et al., “Twins: Revisiting the design of spatial attention in Vision Transformers,” Advances in Neural Information Processing Systems (NIPS), vol. 34, pp. 9355–9366, 2021.
  74. N. Carion et al., “End-to-end object detection with Transformers,” in Proceedings of the European Conference on Computer Vision (ECCV).   Springer, 2020, pp. 213–229.
  75. E. Xie et al., “SegFormer: Simple and efficient design for semantic segmentation with transformers,” Advances in Neural Information Processing Systems (NIPS), vol. 34, pp. 12 077–12 090, 2021.
  76. H. Zhang et al., “ResNeSt: Split-attention networks,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022, pp. 2736–2746.
  77. Y. Yuan et al., “Object-contextual representations for semantic segmentation,” in Proceedings of the European Conference on Computer Vision (ECCV).   Springer, 2020, pp. 173–190.
  78. S. Lazebnik et al., “Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories,” in 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), vol. 2.   IEEE, 2006, pp. 2169–2178.
  79. D. Tian et al., “Review of object instance segmentation based on deep learning,” Journal of Electronic Imaging, vol. 31, no. 4, pp. 041 205–041 205, 2022.
  80. Z. Cai and N. Vasconcelos, “Cascade R-CNN: High Quality Object Detection and Instance Segmentation,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 43, no. 5, pp. 1483–1498, 2019.
  81. Z. Huang et al., “Mask Scoring R-CNN,” in Proceedings of the IEEE/CVF conference on Computer Vision and Pattern Recognition (CVPR), 2019, pp. 6409–6418.
  82. D. Bolya et al., “YOLACT: Real-time instance segmentation,” in Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2019, pp. 9157–9166.
  83. Z. Tian et al., “BoxInst: High-performance instance segmentation with box annotations,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2021, pp. 5443–5452.
  84. X. Wang et al., “SOLO: Segmenting Objects by Locations,” in Proceedings of the European Conference on Computer Vision (ECCV).   Springer, 2020, pp. 649–665.
  85. X. Wang et al., “SOLOv2: Dynamic and Fast Instance Segmentation,” Advances in Neural Information Processing Systems (NIPS), vol. 33, pp. 17 721–17 732, 2020.
  86. Y. Cao et al., “Gcnet: Non-local networks meet squeeze-excitation networks and beyond,” in Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) Workshops, 2019, pp. 0–0.
  87. F. Al-Turjman et al., “An overview of security and privacy in smart cities’ IoT communications,” Transactions on Emerging Telecommunications Technologies, vol. 33, no. 3, p. e3677, 2022.
  88. L. Jiao et al., “A survey of deep learning-based object detection,” IEEE Access, vol. 7, pp. 128 837–128 868, 2019.
  89. W. Gu et al., “A review on 2d instance segmentation based on deep neural networks,” Image and Vision Computing, vol. 120, p. 104401, 2022.
  90. Z. Tian et al., “Conditional convolutions for instance segmentation,” in Proceedings of the European Conference on Computer Vision (ECCV).   Springer, 2020, pp. 282–298.
  91. X. Wang, R. Girshick, A. Gupta, and K. He, “Non-local neural networks,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018, pp. 7794–7803.
  92. J. Dai et al., “Deformable convolutional networks,” in Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2017, pp. 764–773.
  93. S. R. Rath, “Road pothole images for pothole detection.” https://www.kaggle.com/datasets/sovitrath/road-pothole-images-for-pothole-detection, September 2020.
  94. M. Everingham et al., “The pascal Visual Object Classes (VOC) challenge,” International Journal of Computer Vision IJCV, vol. 88, no. 2, pp. 303–338, 2010.
  95. T.-Y. Lin et al., “Microsoft COCO: Common objects in context,” in Proceedings of the European Conference on Computer Vision (ECCV).   Springer, 2014, pp. 740–755.
Citations (5)

Summary

No one has generated a summary of this paper yet.

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.