Dual-Stream Attention Transformers for Sewer Defect Classification
Abstract: We propose a dual-stream multi-scale vision transformer (DS-MSHViT) architecture that processes RGB and optical flow inputs for efficient sewer defect classification. Unlike existing methods that combine the predictions of two separate networks trained on each modality, we jointly train a single network with two branches for RGB and motion. Our key idea is to use self-attention regularization to harness the complementary strengths of the RGB and motion streams. The motion stream alone struggles to generate accurate attention maps, as motion images lack the rich visual features present in RGB images. To facilitate this, we introduce an attention consistency loss between the dual streams. By leveraging motion cues through a self-attention regularizer, we align and enhance RGB attention maps, enabling the network to concentrate on pertinent input regions. We evaluate our data on a public dataset as well as cross-validate our model performance in a novel dataset. Our method outperforms existing models that utilize either convolutional neural networks (CNNs) or multi-scale hybrid vision transformers (MSHViTs) without employing attention regularization between the two streams.
- W. Zhao, T. H. Beach, and Y. Rezgui, “Automated model construction for combined sewer overflow prediction based on efficient lasso algorithm,” IEEE Transactions on Systems, Man, and Cybernetics: Systems, vol. 49, no. 6, pp. 1254–1269, 2019.
- Y.-P. Huang, L. Sithole, and T.-T. Lee, “Structure from motion technique for scene detection using autonomous drone navigation,” IEEE Transactions on Systems, Man, and Cybernetics: Systems, vol. 49, no. 12, pp. 2559–2570, 2019.
- M. Panta, M. T. Hoque, M. Abdelguerfi, and M. C. Flanagin, “Iterlunet: Deep learning architecture for pixel-wise crack detection in levee systems,” IEEE Access, vol. 11, pp. 12 249–12 262, 2023.
- ——, “Pixel-level crack detection in levee systems: A comparative study,” in IEEE International Geoscience and Remote Sensing Symposium. IEEE, 2022, pp. 3059–3062.
- A. Kuchi, M. Panta, M. T. Hoque, M. Abdelguerfi, and M. C. Flanagin, “A machine learning approach to detecting cracks in levees and floodwalls,” Remote Sensing Applications: Society and Environment, vol. 22, p. 100513, 2021.
- A. Kuchi, M. T. Hoque, M. Abdelguerfi, and M. C. Flanagin, “Levee-crack detection from satellite or drone imagery using machine learning approaches,” in IEEE International Geoscience and Remote Sensing Symposium. IEEE, 2020, pp. 976–979.
- ——, “Machine learning applications in detecting sand boils from images,” Array, vol. 3, p. 100012, 2019.
- J. B. Haurum, M. Madadi, S. Escalera, and T. B. Moeslund, “Multi-scale hybrid vision transformer and sinkhorn tokenizer for sewer defect classification,” Automation in Construction, vol. 144, p. 104614, 2022.
- M. M. Islam, A. A. R. Newaz, and A. Karimoddini, “Pedestrian detection for autonomous cars: inference fusion of deep neural networks,” IEEE Transactions on Intelligent Transportation Systems, vol. 23, no. 12, pp. 23 358–23 368, 2022.
- ——, “A pedestrian detection and tracking framework for autonomous cars: Efficient fusion of camera and lidar data,” in 2021 IEEE International Conference on Systems, Man, and Cybernetics (SMC). IEEE, 2021, pp. 1287–1292.
- T. Yin, X. Zhou, and P. Krahenbuhl, “Center-based 3d object detection and tracking,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2021, pp. 11 784–11 793.
- C. Feichtenhofer, A. Pinz, and A. Zisserman, “Detect to track and track to detect,” in Proceedings of the IEEE international conference on computer vision, 2017, pp. 3038–3046.
- O. Duran, K. Althoefer, and L. D. Seneviratne, “Automated sewer pipe inspection through image processing,” in Proceedings 2002 IEEE International Conference on Robotics and Automation (Cat. No. 02CH37292), vol. 3. IEEE, 2002, pp. 2551–2556.
- Y. Li, H. Wang, L. M. Dang, H.-K. Song, and H. Moon, “Vision-based defect inspection and condition assessment for sewer pipes: A comprehensive survey,” Sensors, vol. 22, no. 7, p. 2722, 2022.
- S. S. Kumar, M. Wang, D. M. Abraham, M. R. Jahanshahi, T. Iseley, and J. C. Cheng, “Deep learning–based automated detection of sewer defects in cctv videos,” Journal of Computing in Civil Engineering, vol. 34, no. 1, p. 04019047, 2020.
- H. W. Ji, S. S. Yoo, D. D. Koo, and J.-H. Kang, “Determination of internal elevation fluctuation from cctv footage of sanitary sewers using deep learning,” Water, vol. 13, no. 4, p. 503, 2021.
- Y. Tan, R. Cai, J. Li, P. Chen, and M. Wang, “Automatic detection of sewer defects based on improved you only look once algorithm,” Automation in Construction, vol. 131, p. 103912, 2021.
- L. M. Dang, H. Wang, Y. Li, T. N. Nguyen, and H. Moon, “Defecttr: End-to-end defect detection for sewage networks using a transformer,” Construction and Building Materials, vol. 325, p. 126584, 2022.
- R. Biswas, M. Mutz, P. Pimplikar, N. Ahmed, and D. Werth, “Sewer-ai: Sustainable automated analysis of real-world sewer videos using dnns,” 2023.
- J. B. Haurum, C. H. Bahnsen, M. Pedersen, and T. B. Moeslund, “Water level estimation in sewer pipes using deep convolutional neural networks,” Water, vol. 12, no. 12, p. 3412, 2020.
- H. W. Ji, S. S. Yoo, B.-J. Lee, D. D. Koo, and J.-H. Kang, “Measurement of wastewater discharge in sewer pipes using image analysis,” Water, vol. 12, no. 6, p. 1771, 2020.
- M. Tao, L. Wan, H. Wang, and T. Su, “CAFEN: A correlation-aware feature enhancement network for sewer defect identification,” in International Symposium on Communications and Information Technologies (ISCIT). IEEE, 2022, pp. 204–209.
- F. Plana Rius, M. P. Philipsen, J. M. Mirats Tur, T. B. Moeslund, C. Angulo Bahón, and M. Casas, “Autoencoders for semi-supervised water level modeling in sewer pipes with sparse labeled data,” Water, vol. 14, no. 3, p. 333, 2022.
- K. Chen, H. Li, C. Li, X. Zhao, S. Wu, Y. Duan, and J. Wang, “An automatic defect detection system for petrochemical pipeline based on cycle-gan and yolo v5,” Sensors, vol. 22, no. 20, p. 7907, 2022.
- M. Wang and J. C. Cheng, “A unified convolutional neural network integrated with conditional random field for pipe defect segmentation,” Computer-Aided Civil and Infrastructure Engineering, vol. 35, no. 2, pp. 162–177, 2020.
- Q. Zhou, Z. Situ, S. Teng, H. Liu, W. Chen, and G. Chen, “Automatic sewer defect detection and severity quantification based on pixel-level semantic segmentation,” Tunnelling and Underground Space Technology, vol. 123, p. 104403, 2022.
- Y. Li, H. Wang, L. M. Dang, M. J. Piran, and H. Moon, “A robust instance segmentation framework for underground sewer defect detection,” Measurement, vol. 190, p. 110727, 2022.
- C. Hu, B. Dong, H. Shao, J. Zhang, and Y. Wang, “Toward purifying defect feature for multilabel sewer defect classification,” IEEE Transactions on Instrumentation and Measurement, vol. 72, pp. 1–11, 2023.
- C. Zhao, C. Hu, H. Shao, Z. Wang, and Y. Wang, “Towards trustworthy multi-label sewer defect classification via evidential deep learning,” in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2023, pp. 1–5.
- W. Li, H. Liu, R. Ding, M. Liu, P. Wang, and W. Yang, “Exploiting temporal contexts with strided transformer for 3d human pose estimation,” IEEE Transactions on Multimedia, vol. 25, pp. 1282–1293, 2023.
- T. Huang, L. Huang, S. You, F. Wang, C. Qian, and C. Xu, “Lightvit: Towards light-weight convolution-free vision transformers,” arXiv preprint arXiv:2207.05557, 2022.
- Y. Zhao, H. Tang, Y. Jiang, Q. Wu et al., “Lightweight vision transformer with cross feature attention,” arXiv preprint arXiv:2207.07268, 2022.
- X. Li and S. Li, “Using cnn to improve the performance of the light-weight vit,” in 2022 International Joint Conference on Neural Networks (IJCNN). IEEE, 2022, pp. 1–8.
- S. Mehta and M. Rastegari, “Mobilevit: light-weight, general-purpose, and mobile-friendly vision transformer,” arXiv preprint arXiv:2110.02178, 2021.
- T.-W. Hui, X. Tang, and C. C. Loy, “Liteflownet: A lightweight convolutional neural network for optical flow estimation,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 8981–8989.
- A. Luo, F. Yang, X. Li, and S. Liu, “Learning optical flow with kernel patch attention,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 8906–8915.
- S. Zhao, L. Zhao, Z. Zhang, E. Zhou, and D. Metaxas, “Global matching with overlapping attention for optical flow estimation,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 17 592–17 601.
- X. Sui, S. Li, X. Geng, Y. Wu, X. Xu, Y. Liu, R. Goh, and H. Zhu, “CRAFT: Cross-attentional flow transformer for robust optical flow,” in Proceedings of the IEEE/CVF conference on Computer Vision and Pattern Recognition, 2022, pp. 17 602–17 611.
- Z. Zheng, N. Nie, Z. Ling, P. Xiong, J. Liu, H. Wang, and J. Li, “Dip: Deep inverse patchmatch for high-resolution optical flow,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 8925–8934.
- J. Jeong, J. M. Lin, F. Porikli, and N. Kwak, “Imposing consistency for optical flow estimation,” in Proceedings of the IEEE/CVF conference on Computer Vision and Pattern Recognition, 2022, pp. 3181–3191.
- S. Sun, Y. Chen, Y. Zhu, G. Guo, and G. Li, “Skflow: Learning optical flow with super kernels,” Advances in Neural Information Processing Systems, vol. 35, pp. 11 313–11 326, 2022.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.