HeightMapNet: Explicit Height Modeling for End-to-End HD Map Learning
Abstract: Recent advances in high-definition (HD) map construction from surround-view images have highlighted their cost-effectiveness in deployment. However, prevailing techniques often fall short in accurately extracting and utilizing road features, as well as in the implementation of view transformation. In response, we introduce HeightMapNet, a novel framework that establishes a dynamic relationship between image features and road surface height distributions. By integrating height priors, our approach refines the accuracy of Bird's-Eye-View (BEV) features beyond conventional methods. HeightMapNet also introduces a foreground-background separation network that sharply distinguishes between critical road elements and extraneous background components, enabling precise focus on detailed road micro-features. Additionally, our method leverages multi-scale features within the BEV space, optimally utilizing spatial geometric information to boost model performance. HeightMapNet has shown exceptional results on the challenging nuScenes and Argoverse 2 datasets, outperforming several widely recognized approaches. The code will be available at \url{https://github.com/adasfag/HeightMapNet/}.
- nuScenes: A multimodal dataset for autonomous driving. In CVPR, pages 11621–11631, 2020.
- End-to-end object detection with transformers. In ECCV, pages 213–229. Springer, 2020.
- MapTracker: Tracking with strided memory fusion for consistent vector hd mapping. arXiv preprint arXiv:2403.15951, 2024.
- Efficient and robust 2d-to-bev representation learning via geometry-guided kernel transformer. arXiv preprint arXiv:2206.04584, 2022.
- Pivotnet: Vectorized pivot learning for end-to-end hd map construction. In ICCV, pages 3672–3682, 2023.
- Deep residual learning for image recognition. In CVPR, pages 770–778, 2016.
- Bevpoolv2: A cutting-edge implementation of bevdet toward deployment. arXiv preprint arXiv:2211.17111, 2022.
- Pointpillars: Fast encoders for object detection from point clouds. In CVPR, pages 12697–12705, 2019.
- HDMapNet: An online HD map construction and evaluation framework. In ICRA, pages 4628–4634, 2022.
- DTCLMapper: Dual temporal consistent learning for vectorized hd map construction. IEEE Transactions on Intelligent Transportation Systems, 2024.
- BEVFormer: Learning bird’s-eye-view representation from multi-camera images via spatiotemporal transformers. In ECCV, pages 1–18, 2022.
- MapTR: Structured modeling and learning for online vectorized HD map construction. In ICLR, pages 1–18, 2023.
- MapTRv2: An end-to-end framework for online vectorized hd map construction. arXiv preprint arXiv:2308.05736, 2023.
- Sparse4d: Multi-view 3d object detection with sparse spatial-temporal fusion, 2022.
- Sparse4d v2: Recurrent temporal fusion with sparse model, 2023.
- PETRv2: A unified framework for 3d perception from multi-camera images. In ICCV, pages 3262–3272, 2023.
- VectorMapNet: End-to-end vectorized hd map learning. In ICML, page 22352–22369, 2023.
- Leveraging enhanced queries of point sets for vectorized map construction. In ECCV, 2024.
- Decoupled weight decay regularization. arXiv preprint arXiv:1711.05101, 2017.
- Lift, splat, shoot: Encoding images from arbitrary camera rigs by implicitly unprojecting to 3d. In ECCV, pages 194–210. Springer, 2020.
- End-to-end vectorized hd-map construction with piecewise bezier curve. In CVPR, pages 13218–13228, June 2023.
- MachMap: End-to-end vectorized solution for compact hd-map construction. arXiv preprint arXiv:2306.10301, 2023.
- Efficientnet: Rethinking model scaling for convolutional neural networks. In ICML, pages 6105–6114, 2019.
- Attention is all you need. In NeuIPS, volume 30, 2017.
- DETR3D: 3D object detection from multi-view images via 3D-to-2D queries. In CoRL, pages 1–12, 2021.
- Argoverse 2: Next generation datasets for self-driving perception and forecasting. In NeurIPS Datasets and Benchmarks 2021, 2021.
- HeightFormer: Explicit height modeling without extra data for camera-only 3d object detection in bird’s eye view. arXiv preprint arXiv:2307.13510, 2023.
- Vision transformer with deformable attention. In CVPR, pages 4794–4803, 2022.
- BEVHeight: A robust framework for vision-based roadside 3d object detection. In CVPR, pages 21611–21620, 2023.
- Rope3d: The roadside perception dataset for autonomous driving and monocular 3d object detection task. In CVPR, pages 21341–21350, 2022.
- Dair-v2x: A large-scale dataset for vehicle-infrastructure cooperative 3d object detection. In CVPR, pages 21361–21370, 2022.
- ScalableMap: Scalable map learning for online long-range vectorized hd map construction. In CoRL, 2023.
- StreamMapNet: Streaming mapping network for vectorized online hd map construction. In WACV, pages 7356–7365, 2024.
- Online map vectorization for autonomous driving: A rasterization perspective. In NeuIPS, volume 36, pages 31865–31877, 2023.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.