SS3D: Single Shot 3D Object Detector

Published 30 Apr 2020 in cs.CV, cs.LG, and eess.IV | (2004.14674v2)

Abstract: Single stage deep learning algorithm for 2D object detection was made popular by Single Shot MultiBox Detector (SSD) and it was heavily adopted in several embedded applications. PointPillars is a state of the art 3D object detection algorithm that uses a Single Shot Detector adapted for 3D object detection. The main downside of PointPillars is that it has a two stage approach with learned input representation based on fully connected layers followed by the Single Shot Detector for 3D detection. In this paper we present Single Shot 3D Object Detection (SS3D) - a single stage 3D object detection algorithm which combines straight forward, statistically computed input representation and a Single Shot Detector (based on PointPillars). Computing the input representation is straight forward, does not involve learning and does not have much computational cost. We also extend our method to stereo input and show that, aided by additional semantic segmentation input; our method produces similar accuracy as state of the art stereo based detectors. Achieving the accuracy of two stage detectors using a single stage approach is important as single stage approaches are simpler to implement in embedded, real-time applications. With LiDAR as well as stereo input, our method outperforms PointPillars. When using LiDAR input, our input representation is able to improve the AP3D of Cars objects in the moderate category from 74.99 to 76.84. When using stereo input, our input representation is able to improve the AP3D of Cars objects in the moderate category from 38.13 to 45.13. Our results are also better than other popular 3D object detectors such as AVOD and F-PointNet.

Abstract PDF Upgrade to Chat

Citations (4)

View on Semantic Scholar

Summary

The paper introduces a single-stage 3D object detection method that reduces computational overhead while maintaining competitive accuracy.
It employs a statistically computed input representation enriched with semantic segmentation to efficiently process both LiDAR and stereo data.
It achieves notable performance gains on KITTI benchmarks, improving AP for moderate Car detection and demonstrating potential for real-time embedded applications.

Single Shot 3D Object Detection (SS3D): A Technical Exposition

Introduction

The "Single Shot 3D Object Detector" (SS3D) represents a significant contribution to the field of 3D object detection, particularly in contexts requiring the integration of LiDAR and stereo data for autonomous driving applications. Traditional algorithms such as PointPillars have set a high benchmark in this domain but typically rely on a two-stage processing approach, which complicates real-time applications on embedded systems. SS3D challenges this paradigm by proposing a method that maintains competitive accuracy while operating within a single-stage framework.

Methodology

At the core of SS3D is its input representation strategy, which draws inspiration from the computational efficiency advocated by the Apollo Auto system. Unlike PointPillars, which utilizes computationally intensive fully connected layers to generate feature representations, SS3D opts for a statistically computed input representation that reduces processing overhead without sacrificing accuracy. Specifically, this involves the synthesis of features like occupancy bit, point count, and mean/intensity measures from LiDAR or stereo-generated point clouds, organized within a structured grid system. This simplification enables the effective application of the Single Shot MultiBox Detector (SSD) architecture adapted for 3D environments.

SS3D further extends its applicability to stereo vision by employing methodologies from Pseudo-LiDAR, converting stereo disparity maps into point cloud representations. Additionally, the integration of semantic segmentation information (captured by DeepLabV3+ with a MobileNetV2 backbone) into the input representation enhances detection accuracy, particularly when dealing with stereo input. The resulting variants, namely SS3D-Seg, are equipped to process semantic features and exhibit improved detection performance.

Results and Evaluation

The performance of SS3D was rigorously evaluated on the KITTI 3D Object Detection Benchmark, a respected standard for assessing autonomous vehicle perceptual capabilities. The results indicate that SS3D outperforms established methods like PointPillars in several key metrics. For instance, using LiDAR input, SS3D improved the Average Precision (AP) for 3D detections of Car objects in the moderate category from 74.99% to 76.84%. Moreover, when employing stereo input, SS3D-Seg achieved a notable increase in AP from 38.13% to 45.13%, underscoring the effectiveness of semantic segmentation augmentation.

The single-stage design of SS3D presents a distinct advantage in computational efficiency, making it a suitable candidate for embedded systems in autonomous vehicles. This lightweight nature does not compromise on performance; instead, it leverages streamlined data processing to enhance execution speed and reduce latency, a critical feature for real-time applications.

Implications and Future Directions

SS3D's successful application of a single-stage architecture to both LiDAR and stereo input data highlights its potential to simplify the implementation of 3D object detection in real-world scenarios. The methodology promotes efficient processing without undue compromise on detection accuracy. Such attributes are indispensable for enabling cost-effective and energy-efficient deployment of autonomous systems.

Looking ahead, further refinements could involve adaptive learning mechanisms to optimize the input representation according to dynamic environmental and contextual variables. Moreover, expanding the scope of SS3D's applicability to encompass a broader range of object classes and environmental conditions could provide critical improvements in the robustness of autonomous driving systems.

Conclusion

SS3D advances the field of 3D object detection by offering a practical, efficient, and adaptable solution that meets the performance requirements of modern autonomous systems. Its ability to deliver a single-stage detection process with competitive accuracy to two-stage systems exemplifies the potential for streamlined, scalable algorithms in embedded applications. As the field progresses, SS3D's contributions underscore the importance of innovation in algorithm simplification and computational efficiency.