PointPainting: Sequential Fusion for 3D Object Detection

Published 22 Nov 2019 in cs.CV, cs.LG, eess.IV, and stat.ML | (1911.10150v2)

Abstract: Camera and lidar are important sensor modalities for robotics in general and self-driving cars in particular. The sensors provide complementary information offering an opportunity for tight sensor-fusion. Surprisingly, lidar-only methods outperform fusion methods on the main benchmark datasets, suggesting a gap in the literature. In this work, we propose PointPainting: a sequential fusion method to fill this gap. PointPainting works by projecting lidar points into the output of an image-only semantic segmentation network and appending the class scores to each point. The appended (painted) point cloud can then be fed to any lidar-only method. Experiments show large improvements on three different state-of-the art methods, Point-RCNN, VoxelNet and PointPillars on the KITTI and nuScenes datasets. The painted version of PointRCNN represents a new state of the art on the KITTI leaderboard for the bird's-eye view detection task. In ablation, we study how the effects of Painting depends on the quality and format of the semantic segmentation output, and demonstrate how latency can be minimized through pipelining.

Abstract PDF Upgrade to Chat

Citations (700)

View on Semantic Scholar

Summary

The paper's main contribution is the innovative sequential fusion technique that decorates lidar points with image segmentation scores, boosting detection performance.
Experiments on KITTI and nuScenes show that PointPainting improves mAP and resolves detection ambiguities in challenging scenarios.
The method's flexibility and efficiency make it suitable for real-time applications, enhancing various lidar-based detection networks.

PointPainting: Sequential Fusion for 3D Object Detection

The paper "PointPainting: Sequential Fusion for 3D Object Detection" introduces an innovative approach to sensor fusion in the context of 3D object detection, specifically for applications like self-driving vehicles. The authors propose the PointPainting method, a sequential fusion technique that decorates lidar points with semantic segmentation outputs from images to enhance the overall detection performance.

Methodology

PointPainting operates through a three-stage process:

Image-Based Semantic Segmentation: The first stage involves processing the RGB image using a semantic segmentation network to obtain pixel-wise class scores. This network generates detailed semantic maps which are crucial for subsequent stages.
Fusion (Painting): Lidar points are projected into the image space where they are decorated with segmentation scores, effectively painting the point cloud. These enriched point clouds, now containing both spatial and semantic information, are ready for the final detection stage.
Lidar-Based Detection: The painted point cloud is fed into any existing lidar-based object detection network, leveraging the added semantic context. The authors demonstrate compatibility with several prominent lidar networks including PointPillars, VoxelNet, and PointRCNN.

Experimental Insights

Experiments conducted on the KITTI and nuScenes datasets reveal the method’s efficacy. The painted version of PointRCNN achieved superior performance on the KITTI test set, establishing a new benchmark for bird’s-eye view detection. Notably, performance gains were pronounced in challenging scenarios involving pedestrian and cyclist detection.

Quantitative Results: Three primary lidar networks benefited from PointPainting with mAP improvements across most categories. On KITTI, painted PointRCNN showed substantial mAP gains, and in nuScenes, Painted PointPillars+ showed a remarkable mAP increase of 6.3.
Qualitative Analysis: The qualitative evaluation illustrates PointPainting’s ability to resolve ambiguities often present in lidar-only methods, such as differentiating pedestrians from similar objects like poles. Consistent improvements in detection accuracy were observed without introducing false positives.

Analysis and Implications

The research highlights several key insights:

General Applicability: PointPainting demonstrates flexibility, augmenting various lidar-only detection methods effectively, thus reinforcing its general applicability in sensor fusion tasks.
Robustness: Across different datasets and sensor configurations, the method consistently improved detection metrics, indicating robustness in diverse scenarios.
Efficiency Considerations: The sequential design allows PointPainting to be implemented with minimal latency overhead when leveraging methods such as pipelining, making it viable for real-time applications in autonomous systems.

Future Perspectives

PointPainting opens up avenues for further improvements in multi-sensor fusion methodologies. Future directions could explore enhanced segmentation networks for deeper semantic insights, robust methods for dealing with segmentation quality variations, and further integration with other sensor modalities like radar. Additionally, methods to optimize the trade-off between detection quality and computational overhead could be investigated to maximize real-time applicability in resource-constrained environments.

Conclusion

By projecting semantic segmentation results onto lidar point clouds, PointPainting achieves effective fusion of visual and spatial information, setting a new paradigm in 3D object detection for autonomous systems. Its significant improvements in detection performance across standard benchmarks emphasize its potential impact and applicability in real-world systems.

Markdown Report Issue