End-to-End Vectorized HD-map Construction with Piecewise Bezier Curve
Abstract: Vectorized high-definition map (HD-map) construction, which focuses on the perception of centimeter-level environmental information, has attracted significant research interest in the autonomous driving community. Most existing approaches first obtain rasterized map with the segmentation-based pipeline and then conduct heavy post-processing for downstream-friendly vectorization. In this paper, by delving into parameterization-based methods, we pioneer a concise and elegant scheme that adopts unified piecewise Bezier curve. In order to vectorize changeful map elements end-to-end, we elaborate a simple yet effective architecture, named Piecewise Bezier HD-map Network (BeMapNet), which is formulated as a direct set prediction paradigm and postprocessing-free. Concretely, we first introduce a novel IPM-PE Align module to inject 3D geometry prior into BEV features through common position encoding in Transformer. Then a well-designed Piecewise Bezier Head is proposed to output the details of each map element, including the coordinate of control points and the segment number of curves. In addition, based on the progressively restoration of Bezier curve, we also present an efficient Point-Curve-Region Loss for supervising more robust and precise HD-map modeling. Extensive comparisons show that our method is remarkably superior to other existing SOTAs by 18.0 mAP at least.
- Method for registration of 3-d shapes. In Sensor Fusion IV: Control Paradigms and Data Structures, volume 1611, pages 586–606. Spie, 1992.
- nuscenes: A multimodal dataset for autonomous driving. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 11621–11631, 2020.
- Structured bird’s-eye-view traffic scene understanding from onboard images. In Proceedings of the IEEE International Conference on Computer Vision, pages 15661–15670, 2021.
- End-to-end object detection with transformers. In European conference on computer vision, pages 213–229. Springer, 2020.
- Persformer: 3d lane detection via perspective transformer and the openlane benchmark. arXiv preprint arXiv:2203.11089, 2022.
- Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE Transactions on Pattern Analysis and Machine Intelligence, 40(4):834–848, 2017.
- Dynamic convolution: Attention over convolution kernels. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 11030–11039, 2020.
- Masked-attention mask transformer for universal image segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 1290–1299, 2022.
- Neat: Neural attention fields for end-to-end autonomous driving. In Proceedings of the IEEE International Conference on Computer Vision, pages 15793–15803, 2021.
- Uncertainty-aware short-term motion prediction of traffic actors for autonomous driving. In Proceedings of the IEEE Winter Conference on Applications of Computer Vision, pages 2095–2104, 2020.
- An online multi-robot slam system for 3d lidars. In IEEE International Conference on Intelligent Robots and Systems, pages 1004–1011. IEEE, 2017.
- Rethinking efficient lane detection via curve modeling. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 17062–17070, 2022.
- Mask r-cnn. In Proceedings of the IEEE International Conference on Computer Vision, pages 2961–2969, 2017.
- Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 770–778, 2016.
- Fiery: Future instance prediction in bird’s-eye view from surround monocular cameras. In Proceedings of the IEEE International Conference on Computer Vision, pages 15273–15282, October 2021.
- Fiery: Future instance prediction in bird’s-eye view from surround monocular cameras. In Proceedings of the IEEE International Conference on Computer Vision, pages 15273–15282, 2021.
- Bevdet: High-performance multi-camera 3d object detection in bird-eye-view. arXiv preprint arXiv:2112.11790, 2021.
- Jialin Jiao. Machine learning assisted high-definition map creation. In IEEE Annual Computer Software and Applications Conference, volume 1, pages 367–373. IEEE, 2018.
- Predicting scene parsing and motion dynamics in the future. Advances in Neural Information Processing Systems, 30, 2017.
- Panoptic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 9404–9413, 2019.
- Imagenet classification with deep convolutional neural networks. Communications of the ACM, 60(6):84–90, 2017.
- Robust pose-graph loop-closures with expectation-maximization. In IEEE International Conference on Intelligent Robots and Systems, pages 556–563. IEEE, 2013.
- Desire: Distant future prediction in dynamic scenes with interacting agents. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 336–345, 2017.
- Hdmapnet: An online hd map construction and evaluation framework. In International Conference on Robotics and Automation, pages 4628–4634. IEEE, 2022.
- Line-cnn: End-to-end traffic line detection with line proposal unit. IEEE Transactions on Intelligent Transportation Systems, 21(1):248–258, 2019.
- Bevformer: Learning bird’s-eye-view representation from multi-camera images via spatiotemporal transformers. arXiv preprint arXiv:2203.17270, 2022.
- High definition map for automated driving: Overview and analysis. The Journal of Navigation, 73(2):324–341, 2020.
- End-to-end lane shape prediction with transformers. In Proceedings of the IEEE Winter Conference on Applications of Computer Vision, pages 3694–3702, 2021.
- Abcnet: Real-time scene text spotting with adaptive bezier-curve network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 9809–9818, 2020.
- Petr: Position embedding transformation for multi-view 3d object detection. European Conference on Computer Vision, 2022.
- Swin transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the IEEE International Conference on Computer Vision, pages 10012–10022, 2021.
- Fully convolutional networks for semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 3431–3440, 2015.
- Decoupled weight decay regularization. In International Conference on Learning Representations, 2018.
- Enhancing road maps by parsing aerial images around the world. In Proceedings of the IEEE International Conference on Computer Vision, pages 1689–1697, 2015.
- Hd maps: Fine-grained road segmentation by parsing ground and aerial images. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 3611–3619, 2016.
- Icp-based pose-graph slam. In IEEE International Symposium on Safety, Security, and Rescue Robotics, pages 195–200. IEEE, 2016.
- Hdmapgen: A hierarchical graph generative model of high definition maps. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 4227–4236, 2021.
- V-net: Fully convolutional neural networks for volumetric medical image segmentation. In International Conference on 3D Vision, pages 565–571. IEEE, 2016.
- Towards end-to-end lane detection: an instance segmentation approach. In IEEE Intelligent Vehicles Symposium, pages 286–291. IEEE, 2018.
- Cross-view semantic segmentation for sensing surroundings. IEEE Robotics and Automation Letters, 5(3):4867–4873, 2020.
- Spatial as deep: Spatial cnn for traffic scene understanding. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 32, 2018.
- Bevsegformer: Bird’s eye view semantic segmentation from arbitrary camera rigs. IEEE Winter Conference on Applications of Computer Vision, 2023.
- Lift, splat, shoot: Encoding images from arbitrary camera rigs by implicitly unprojecting to 3d. In European Conference on Computer Vision, pages 194–210. Springer, 2020.
- A sim2real deep learning approach for the transformation of images from multiple vehicle-mounted cameras to a semantically segmented image in bird’s eye view. In IEEE International Conference on Intelligent Transportation Systems, pages 1–7. IEEE, 2020.
- Lego-loam: Lightweight and ground-optimized lidar odometry and mapping on variable terrain. In IEEE International Conference on Intelligent Robots and Systems, pages 4758–4765. IEEE, 2018.
- Lio-sam: Tightly-coupled lidar inertial odometry via smoothing and mapping. In IEEE International Conference on Intelligent Robots and Systems, pages 5135–5142. IEEE, 2020.
- Keep your eyes on the lane: Real-time attention-guided lane detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 294–302, June 2021.
- Polylanenet: Lane estimation via deep polynomial regression. In International Conference on Pattern Recognition, pages 6150–6156. IEEE, 2021.
- Efficientnet: Rethinking model scaling for convolutional neural networks. In International Conference on Machine Learning, pages 6105–6114. PMLR, 2019.
- Efficientdet: Scalable and efficient object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 10781–10790, 2020.
- End-to-end lane detection through differentiable least-squares fitting. In Proceedings of the IEEE International Conference on Computer Vision Workshops, pages 0–0, 2019.
- Attention is all you need. Advances in Neural Information Processing Systems, 30, 2017.
- Patch to the future: Unsupervised visual prediction. In Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, pages 3302–3309, 2014.
- A keypoint-based global association network for lane detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 1392–1401, 2022.
- Fcos3d: Fully convolutional one-stage monocular 3d object detection. In Proceedings of the IEEE International Conference on Computer Vision, pages 913–922, 2021.
- Detr3d: 3d object detection from multi-view images via 3d-to-2d queries. In Conference on Robot Learning, pages 180–191. PMLR, 2022.
- Hdnet: Exploiting hd maps for 3d object detection. In Conference on Robot Learning, pages 146–155. PMLR, 2018.
- A robust pose graph approach for city scale lidar mapping. In IEEE International Conference on Intelligent Robots and Systems, pages 1175–1182. IEEE, 2018.
- Projecting your view attentively: Monocular road scene layout estimation via cross-view transformation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 15536–15545, 2021.
- Loam: Lidar odometry and mapping in real-time. In Robotics: Science and Systems, volume 2, pages 1–9. Berkeley, CA, 2014.
- A survey on multi-task learning. IEEE Transactions on Knowledge and Data Engineering, 2021.
- Resa: Recurrent feature-shift aggregator for lane detection. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 35, pages 3547–3554, 2021.
- Cross-view transformers for real-time map-view semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 13760–13769, 2022.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.