HawkDrive: A Transformer-driven Visual Perception System for Autonomous Driving in Night Scene
Abstract: Many established vision perception systems for autonomous driving scenarios ignore the influence of light conditions, one of the key elements for driving safety. To address this problem, we present HawkDrive, a novel perception system with hardware and software solutions. Hardware that utilizes stereo vision perception, which has been demonstrated to be a more reliable way of estimating depth information than monocular vision, is partnered with the edge computing device Nvidia Jetson Xavier AGX. Our software for low light enhancement, depth estimation, and semantic segmentation tasks, is a transformer-based neural network. Our software stack, which enables fast inference and noise reduction, is packaged into system modules in Robot Operating System 2 (ROS2). Our experimental results have shown that the proposed end-to-end system is effective in improving the depth estimation and semantic segmentation performance. Our dataset and codes will be released at https://github.com/ZionGo6/HawkDrive.
- A. Sharma and R. T. Tan, “Nighttime visibility enhancement by increasing the dynamic range and suppression of light effects,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 11 977–11 986.
- N. Smolyanskiy, A. Kamenev, and S. Birchfield, “On the importance of stereo for accurate depth estimation: An efficient semi-supervised deep neural network approach,” in Proceedings of the IEEE conference on computer vision and pattern recognition workshops, 2018, pp. 1007–1015.
- H. Xu, J. Zhang, J. Cai, H. Rezatofighi, F. Yu, D. Tao, and A. Geiger, “Unifying flow, stereo and depth estimation,” IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023.
- W. Liu, W. Li, J. Zhu, M. Cui, X. Xie, and L. Zhang, “Improving nighttime driving-scene segmentation via dual image-adaptive learnable filters,” IEEE Transactions on Circuits and Systems for Video Technology, 2023.
- A. Anoosheh, T. Sattler, R. Timofte, M. Pollefeys, and L. Van Gool, “Night-to-day image translation for retrieval-based localization,” in 2019 International Conference on Robotics and Automation (ICRA). IEEE, 2019, pp. 5958–5964.
- Y. Jiang, X. Gong, D. Liu, Y. Cheng, C. Fang, X. Shen, J. Yang, P. Zhou, and Z. Wang, “Enlightengan: Deep light enhancement without paired supervision,” IEEE transactions on image processing, vol. 30, pp. 2340–2349, 2021.
- Z. Zheng, Y. Wu, X. Han, and J. Shi, “Forkgan: Seeing into the rainy night,” in Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part III 16. Springer, 2020, pp. 155–170.
- F. Faisal, M. A. Salam, M. B. Habib, M. S. Islam, and M. M. Nishat, “Depth estimation from video using computer vision and machine learning with hyperparameter optimization,” in 2022 4th International Conference on Smart Sensors and Application (ICSSA). IEEE, 2022, pp. 39–44.
- H. Königshof, N. O. Salscheider, and C. Stiller, “Realtime 3d object detection for automated driving using stereo vision and semantic information,” in 2019 IEEE Intelligent Transportation Systems Conference (ITSC). IEEE, 2019, pp. 1405–1410.
- A. Zaarane, I. Slimani, A. Hamdoun, and I. Atouf, “Vehicle to vehicle distance measurement for self-driving systems,” in 2019 6th International Conference on Control, Decision and Information Technologies (CoDIT). IEEE, 2019, pp. 1587–1591.
- N. Kemsaram, A. Das, and G. Dubbelman, “A model-based design of an onboard stereo vision system: obstacle motion estimation for cooperative automated vehicles,” SN Applied Sciences, vol. 4, no. 7, p. 199, 2022.
- Y. Almalioglu, M. Turan, N. Trigoni, and A. Markham, “Deep learning-based robust positioning for all-weather autonomous driving,” Nature Machine Intelligence, vol. 4, no. 9, pp. 749–760, 2022.
- Y. Bang, Y. Lee, and B. Kang, “Image-to-image translation-based data augmentation for robust ev charging inlet detection,” IEEE Robotics and Automation Letters, vol. 7, no. 2, pp. 3726–3733, 2022.
- A. Savinykh, M. Kurenkov, E. Kruzhkov, E. Yudin, A. Potapov, P. Karpyshev, and D. Tsetserukou, “Darkslam: Gan-assisted visual slam for reliable operation in low-light conditions,” in 2022 IEEE 95th Vehicular Technology Conference:(VTC2022-Spring). IEEE, 2022, pp. 1–6.
- M. A. Moll, H. S. Baird, and C. An, “Truthing for pixel-accurate segmentation,” in 2008 The Eighth IAPR International Workshop on Document Analysis Systems. IEEE, 2008, pp. 379–385.
- S. Macenski, A. Soragna, M. Carroll, and Z. Ge, “Impact of ros 2 node composition in robotic systems,” IEEE Robotics and Autonomous Letters (RA-L), 2023. [Online]. Available: https://arxiv.org/abs/2305.09933
- X. Xu, R. Wang, C.-W. Fu, and J. Jia, “Snr-aware low-light image enhancement,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2022, pp. 17 714–17 724.
- E. Xie, W. Wang, Z. Yu, A. Anandkumar, J. M. Alvarez, and P. Luo, “Segformer: Simple and efficient design for semantic segmentation with transformers,” Advances in Neural Information Processing Systems, vol. 34, pp. 12 077–12 090, 2021.
- R. Ranftl, A. Bochkovskiy, and V. Koltun, “Vision transformers for dense prediction,” in Proceedings of the IEEE/CVF international conference on computer vision, 2021, pp. 12 179–12 188.
- C. Albl, Z. Kukelova, V. Larsson, M. Polic, T. Pajdla, and K. Schindler, “From two rolling shutters to one global shutter,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, pp. 2505–2513.
- P. S.K, S. A. Kesanapalli, and Y. Simmhan, “Characterizing the performance of accelerated jetson edge devices for training deep learning models,” Proc. ACM Meas. Anal. Comput. Syst., vol. 6, no. 3, dec 2022. [Online]. Available: https://doi.org/10.1145/3570604
- S. Macenski, T. Foote, B. Gerkey, C. Lalancette, and W. Woodall, “Robot operating system 2: Design, architecture, and uses in the wild,” Science Robotics, vol. 7, no. 66, p. eabm6074, 2022. [Online]. Available: https://www.science.org/doi/abs/10.1126/scirobotics.abm6074
- K. Pulli, A. Baksheev, K. Kornyakov, and V. Eruhimov, “Real-time computer vision with opencv,” Commun. ACM, vol. 55, no. 6, p. 61–69, jun 2012. [Online]. Available: https://doi.org/10.1145/2184319.2184337
- “Docker documantations.” [Online]. Available: https://docs.docker.com/
- “Common utilities, packages, scripts, dockerfiles, and testing infrastructure for isaac ros packages.” [Online]. Available: https://github.com/NVIDIA-ISAAC-ROS/
- Y. Xia, J. Monica, W.-L. Chao, B. Hariharan, K. Q. Weinberger, and M. Campbell, “Image-to-image translation for autonomous driving from coarsely-aligned image pairs,” in 2023 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 2023, pp. 7756–7762.
- R. Collins, “A space-sweep approach to true multi-image matching,” in Proceedings CVPR IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 1996, pp. 358–363.
- H. Xu, J. Zhang, J. Cai, H. Rezatofighi, and D. Tao, “Gmflow: Learning optical flow via global matching,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2022, pp. 8121–8130.
- D. Lowe, “Object recognition from local scale-invariant features,” in Proceedings of the Seventh IEEE International Conference on Computer Vision, vol. 2, 1999, pp. 1150–1157 vol.2.
- A. Geiger, P. Lenz, and R. Urtasun, “Are we ready for autonomous driving? the kitti vision benchmark suite,” in Conference on Computer Vision and Pattern Recognition (CVPR), 2012.
- M.-Y. Liu, T. Breuel, and J. Kautz, “Unsupervised image-to-image translation networks,” Advances in neural information processing systems, vol. 30, 2017.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.