Papers
Topics
Authors
Recent
Search
2000 character limit reached

Enhanced Parking Perception by Multi-Task Fisheye Cross-view Transformers

Published 22 Aug 2024 in cs.CV and cs.AI | (2408.12575v2)

Abstract: Current parking area perception algorithms primarily focus on detecting vacant slots within a limited range, relying on error-prone homographic projection for both labeling and inference. However, recent advancements in Advanced Driver Assistance System (ADAS) require interaction with end-users through comprehensive and intelligent Human-Machine Interfaces (HMIs). These interfaces should present a complete perception of the parking area going from distinguishing vacant slots' entry lines to the orientation of other parked vehicles. This paper introduces Multi-Task Fisheye Cross View Transformers (MT F-CVT), which leverages features from a four-camera fisheye Surround-view Camera System (SVCS) with multihead attentions to create a detailed Bird-Eye View (BEV) grid feature map. Features are processed by both a segmentation decoder and a Polygon-Yolo based object detection decoder for parking slots and vehicles. Trained on data labeled using LiDAR, MT F-CVT positions objects within a 25m x 25m real open-road scenes with an average error of only 20 cm. Our larger model achieves an F-1 score of 0.89. Moreover the smaller model operates at 16 fps on an Nvidia Jetson Orin embedded board, with similar detection results to the larger one. MT F-CVT demonstrates robust generalization capability across different vehicles and camera rig configurations. A demo video from an unseen vehicle and camera rig is available at: https://streamable.com/jjw54x.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (18)
  1. Lara: Latents and rays for multi-camera bird’s-eye-view semantic segmentation. In 6th Annual Conference on Robot Learning.
  2. nuscenes: A multimodal dataset for autonomous driving. In CVPR.
  3. Context-based parking slot detection with a realistic dataset. IEEE Access, 8:171551–171559.
  4. Vacant parking slot detection in the around view image based on deep learning. Sensors (Switzerland), 20(7):1–22.
  5. Focal loss for dense object detection. In Proceedings of the IEEE international conference on computer vision, pages 2980–2988.
  6. Vision-centric bev perception: A survey. arXiv preprint arXiv:2208.02797.
  7. Attentional graph neural network for parking-slot detection. IEEE Robotics and Automation Letters, 6(2):3445–3450.
  8. Bevfastline: Single shot fast bev line detection for automated parking applications. Proceedings Copyright, 220:231.
  9. Nvautonet: Fast and accurate 360deg 3d visual perception for self driving. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pages 7376–7385.
  10. Lift, Splat, Shoot: Encoding Images From Arbitrary Camera Rigs by Implicitly Unprojecting to 3D. In Proceedings of the European Conference on Computer Vision.
  11. F2bev: Bird’s eye view generation from surround-view fisheye camera images for automated driving. In 2023 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 9367–9374. IEEE.
  12. EfficientNet: Rethinking model scaling for convolutional neural networks. In Chaudhuri, K. and Salakhutdinov, R., editors, Proceedings of the 36th International Conference on Machine Learning, volume 97 of Proceedings of Machine Learning Research, pages 6105–6114. PMLR.
  13. Efficientnetv2: Smaller models and faster training. In Meila, M. and Zhang, T., editors, Proceedings of the 38th International Conference on Machine Learning, volume 139 of Proceedings of Machine Learning Research, pages 10096–10106. PMLR.
  14. Attention is all you need. Advances in neural information processing systems, 30.
  15. Holistic parking slot detection with polygon-shaped representations. In 2023 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 5797–5803.
  16. M2bev: Multi-camera joint 3d detection and segmentation with unified birds-eye view representation. arXiv preprint arXiv:2204.05088.
  17. Vision-based parking-slot detection: A dcnn-based approach and a large-scale benchmark dataset. IEEE Transactions on Image Processing, 27(11):5350–5364.
  18. Cross-view transformers for real-time map-view semantic segmentation. In 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 13750–13759, Los Alamitos, CA, USA. IEEE Computer Society.

Summary

Paper to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.