Papers
Topics
Authors
Recent
Search
2000 character limit reached

LVIC: Multi-modality segmentation by Lifting Visual Info as Cue

Published 8 Mar 2024 in cs.CV | (2403.05159v1)

Abstract: Multi-modality fusion is proven an effective method for 3d perception for autonomous driving. However, most current multi-modality fusion pipelines for LiDAR semantic segmentation have complicated fusion mechanisms. Point painting is a quite straight forward method which directly bind LiDAR points with visual information. Unfortunately, previous point painting like methods suffer from projection error between camera and LiDAR. In our experiments, we find that this projection error is the devil in point painting. As a result of that, we propose a depth aware point painting mechanism, which significantly boosts the multi-modality fusion. Apart from that, we take a deeper look at the desired visual feature for LiDAR to operate semantic segmentation. By Lifting Visual Information as Cue, LVIC ranks 1st on nuScenes LiDAR semantic segmentation benchmark. Our experiments show the robustness and effectiveness. Codes would be make publicly available soon.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (37)
  1. Localbins: Improving depth estimation by learning local distributions. In European Conference on Computer Vision, pages 480–496. Springer, 2022.
  2. Zoedepth: Zero-shot transfer by combining relative and metric depth. arXiv preprint arXiv:2302.12288, 2023.
  3. nuscenes: A multimodal dataset for autonomous driving. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 11621–11631, 2020.
  4. Efficientvit: Enhanced linear attention for high-resolution low-computation visual recognition. arXiv preprint arXiv:2205.14756, 2022.
  5. 3d u-net: learning dense volumetric segmentation from sparse annotation. In Medical Image Computing and Computer-Assisted Intervention–MICCAI 2016: 19th International Conference, Athens, Greece, October 17-21, 2016, Proceedings, Part II 19, pages 424–432. Springer, 2016.
  6. Pep: a point enhanced painting method for unified point cloud tasks, 2024.
  7. An image is worth 16x16 words: Transformers for image recognition at scale, 2020.
  8. Rethinking bisenet for real-time semantic segmentation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 9716–9725, 2021.
  9. Deepstereo: Learning to predict new views from the world’s imagery. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 5515–5524, 2016.
  10. Panoptic nuscenes: A large-scale benchmark for lidar panoptic segmentation and tracking. arXiv preprint arXiv:2109.03805, 2021.
  11. Learning 3d semantic segmentation with only 2d image supervision. In 2021 International Conference on 3D Vision (3DV), pages 361–372. IEEE, 2021.
  12. Deep residual learning for image recognition. CoRR, abs/1512.03385, 2015.
  13. Bridging nonlinearities and stochastic regularizers with gaussian error linear units. CoRR, abs/1606.08415, 2016.
  14. Revisiting multi-modal 3d semantic segmentation in real-world autonomous driving. arXiv preprint arXiv:2310.08826, 2023.
  15. Monocular depth estimation using relative depth maps. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 9729–9738, 2019.
  16. Mseg3d: Multi-modal 3d semantic segmentation for autonomous driving. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 21694–21704, 2023.
  17. A two-streamed network for estimating fine-scaled depth maps from single rgb images. In Proceedings of the IEEE international conference on computer vision, pages 3372–3380, 2017.
  18. Cpgnet: Cascade point-grid fusion network for real-time lidar semantic segmentation. In 2022 International Conference on Robotics and Automation (ICRA), pages 11117–11123. IEEE, 2022.
  19. Bevformer: Learning bird’s-eye-view representation from multi-camera images via spatiotemporal transformers. arXiv preprint arXiv:2203.17270, 2022.
  20. Visual instruction tuning, 2023.
  21. A convnet for the 2020s. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022.
  22. Rangenet++: Fast and accurate lidar semantic segmentation. In 2019 IEEE/RSJ international conference on intelligent robots and systems (IROS), pages 4213–4220. IEEE, 2019.
  23. Deep learning for monocular depth estimation: A review. Neurocomputing, 438:14–33, 2021.
  24. Pointnet: Deep learning on point sets for 3d classification and segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 652–660, 2017.
  25. Pointnet++: Deep hierarchical feature learning on point sets in a metric space. Advances in neural information processing systems, 30, 2017.
  26. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014.
  27. On the importance of stereo for accurate depth estimation: An efficient semi-supervised deep neural network approach. In Proceedings of the IEEE conference on computer vision and pattern recognition workshops, pages 1007–1015, 2018.
  28. Searching efficient 3d architectures with sparse point-voxel convolution. In European conference on computer vision, pages 685–702. Springer, 2020.
  29. Pointpainting: Sequential fusion for 3d object detection. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 4604–4612, 2020.
  30. Convnext v2: Co-designing and scaling convnets with masked autoencoders. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 16133–16142, 2023.
  31. Point transformer v3: Simpler, faster, stronger. In CVPR, 2024.
  32. Point transformer v2: Grouped vector attention and partition-based pooling. In NeurIPS, 2022.
  33. Rpvnet: A deep and efficient range-point-voxel fusion network for lidar point cloud segmentation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 16024–16033, 2021.
  34. Multimodal virtual point 3d detection. Advances in Neural Information Processing Systems, 34:16494–16507, 2021.
  35. Exploring self-attention for image recognition. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 10076–10085, 2020.
  36. Point transformer. In ICCV, 2021.
  37. Perception-aware multi-sensor fusion for 3d lidar semantic segmentation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 16280–16290, 2021.

Summary

No one has generated a summary of this paper yet.

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.