Papers
Topics
Authors
Recent
Search
2000 character limit reached

HeightMapNet: Explicit Height Modeling for End-to-End HD Map Learning

Published 3 Nov 2024 in cs.CV and cs.AI | (2411.01408v1)

Abstract: Recent advances in high-definition (HD) map construction from surround-view images have highlighted their cost-effectiveness in deployment. However, prevailing techniques often fall short in accurately extracting and utilizing road features, as well as in the implementation of view transformation. In response, we introduce HeightMapNet, a novel framework that establishes a dynamic relationship between image features and road surface height distributions. By integrating height priors, our approach refines the accuracy of Bird's-Eye-View (BEV) features beyond conventional methods. HeightMapNet also introduces a foreground-background separation network that sharply distinguishes between critical road elements and extraneous background components, enabling precise focus on detailed road micro-features. Additionally, our method leverages multi-scale features within the BEV space, optimally utilizing spatial geometric information to boost model performance. HeightMapNet has shown exceptional results on the challenging nuScenes and Argoverse 2 datasets, outperforming several widely recognized approaches. The code will be available at \url{https://github.com/adasfag/HeightMapNet/}.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (34)
  1. nuScenes: A multimodal dataset for autonomous driving. In CVPR, pages 11621–11631, 2020.
  2. End-to-end object detection with transformers. In ECCV, pages 213–229. Springer, 2020.
  3. MapTracker: Tracking with strided memory fusion for consistent vector hd mapping. arXiv preprint arXiv:2403.15951, 2024.
  4. Efficient and robust 2d-to-bev representation learning via geometry-guided kernel transformer. arXiv preprint arXiv:2206.04584, 2022.
  5. Pivotnet: Vectorized pivot learning for end-to-end hd map construction. In ICCV, pages 3672–3682, 2023.
  6. Deep residual learning for image recognition. In CVPR, pages 770–778, 2016.
  7. Bevpoolv2: A cutting-edge implementation of bevdet toward deployment. arXiv preprint arXiv:2211.17111, 2022.
  8. Pointpillars: Fast encoders for object detection from point clouds. In CVPR, pages 12697–12705, 2019.
  9. HDMapNet: An online HD map construction and evaluation framework. In ICRA, pages 4628–4634, 2022.
  10. DTCLMapper: Dual temporal consistent learning for vectorized hd map construction. IEEE Transactions on Intelligent Transportation Systems, 2024.
  11. BEVFormer: Learning bird’s-eye-view representation from multi-camera images via spatiotemporal transformers. In ECCV, pages 1–18, 2022.
  12. MapTR: Structured modeling and learning for online vectorized HD map construction. In ICLR, pages 1–18, 2023.
  13. MapTRv2: An end-to-end framework for online vectorized hd map construction. arXiv preprint arXiv:2308.05736, 2023.
  14. Sparse4d: Multi-view 3d object detection with sparse spatial-temporal fusion, 2022.
  15. Sparse4d v2: Recurrent temporal fusion with sparse model, 2023.
  16. PETRv2: A unified framework for 3d perception from multi-camera images. In ICCV, pages 3262–3272, 2023.
  17. VectorMapNet: End-to-end vectorized hd map learning. In ICML, page 22352–22369, 2023.
  18. Leveraging enhanced queries of point sets for vectorized map construction. In ECCV, 2024.
  19. Decoupled weight decay regularization. arXiv preprint arXiv:1711.05101, 2017.
  20. Lift, splat, shoot: Encoding images from arbitrary camera rigs by implicitly unprojecting to 3d. In ECCV, pages 194–210. Springer, 2020.
  21. End-to-end vectorized hd-map construction with piecewise bezier curve. In CVPR, pages 13218–13228, June 2023.
  22. MachMap: End-to-end vectorized solution for compact hd-map construction. arXiv preprint arXiv:2306.10301, 2023.
  23. Efficientnet: Rethinking model scaling for convolutional neural networks. In ICML, pages 6105–6114, 2019.
  24. Attention is all you need. In NeuIPS, volume 30, 2017.
  25. DETR3D: 3D object detection from multi-view images via 3D-to-2D queries. In CoRL, pages 1–12, 2021.
  26. Argoverse 2: Next generation datasets for self-driving perception and forecasting. In NeurIPS Datasets and Benchmarks 2021, 2021.
  27. HeightFormer: Explicit height modeling without extra data for camera-only 3d object detection in bird’s eye view. arXiv preprint arXiv:2307.13510, 2023.
  28. Vision transformer with deformable attention. In CVPR, pages 4794–4803, 2022.
  29. BEVHeight: A robust framework for vision-based roadside 3d object detection. In CVPR, pages 21611–21620, 2023.
  30. Rope3d: The roadside perception dataset for autonomous driving and monocular 3d object detection task. In CVPR, pages 21341–21350, 2022.
  31. Dair-v2x: A large-scale dataset for vehicle-infrastructure cooperative 3d object detection. In CVPR, pages 21361–21370, 2022.
  32. ScalableMap: Scalable map learning for online long-range vectorized hd map construction. In CoRL, 2023.
  33. StreamMapNet: Streaming mapping network for vectorized online hd map construction. In WACV, pages 7356–7365, 2024.
  34. Online map vectorization for autonomous driving: A rasterization perspective. In NeuIPS, volume 36, pages 31865–31877, 2023.

Summary

Paper to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 1 tweet with 0 likes about this paper.