H3DNet: 3D Object Detection Using Hybrid Geometric Primitives

Published 10 Jun 2020 in cs.CV | (2006.05682v3)

Abstract: We introduce H3DNet, which takes a colorless 3D point cloud as input and outputs a collection of oriented object bounding boxes (or BB) and their semantic labels. The critical idea of H3DNet is to predict a hybrid set of geometric primitives, i.e., BB centers, BB face centers, and BB edge centers. We show how to convert the predicted geometric primitives into object proposals by defining a distance function between an object and the geometric primitives. This distance function enables continuous optimization of object proposals, and its local minimums provide high-fidelity object proposals. H3DNet then utilizes a matching and refinement module to classify object proposals into detected objects and fine-tune the geometric parameters of the detected objects. The hybrid set of geometric primitives not only provides more accurate signals for object detection than using a single type of geometric primitives, but it also provides an overcomplete set of constraints on the resulting 3D layout. Therefore, H3DNet can tolerate outliers in predicted geometric primitives. Our model achieves state-of-the-art 3D detection results on two large datasets with real 3D scans, ScanNet and SUN RGB-D.

Abstract PDF Upgrade to Chat

Citations (176)

View on Semantic Scholar

Summary

H3DNet: Enhancing 3D Object Detection with Hybrid Geometric Primitives

The paper discusses a novel approach to 3D object detection that leverages a hybrid set of geometric primitives, namely bounding box (BB) centers, BB face centers, and BB edge centers, to improve accuracy and robustness in detection from colorless 3D point clouds. The method, named H3DNet, introduces significant advancements over existing techniques by allowing continuous optimization of object proposals, thus refining detection results through matching and refining geometric parameters.

Technical Overview

H3DNet employs a three-module architecture:

Geometric Primitive Module: This module predicts an overcomplete set of geometric primitives from input 3D point clouds. It computes dense pointwise descriptors and outputs varied geometric primitives including BB centers, face centers, and edge centers, each of which contributes distinct constraints to the object detection task. This diversity enables the model to leverage different strengths of these features across various object types and scenes.
Proposal Generation Module: Object proposals are generated by converting the predicted geometric primitives into minima of a customized distance function. This step involves continuous optimization, allowing the module to improve initial object proposals significantly, irrespective of initial imprecisions.
Classification and Refinement Module: The final module classifies the refined object proposals and predicts offset vectors to further fine-tune BB center, size, orientation, and semantic labels. The module aggregates features from associated geometric primitives, enabling enhanced inference capabilities.

Empirical Evaluation

H3DNet demonstrates state-of-the-art performance on two significant datasets: ScanNet and SUN RGB-D. On ScanNet, it achieves a mAP of 67.2% at an IoU threshold of 0.25, marking an 8.5% improvement over previous methods relying solely on geometric data. Similarly, on SUN RGB-D, it records a mAP improvement of 2.4% at the same IoU threshold.

The model shows particular strength in detecting thin and irregular objects such as windows and shower curtains. This capability is attributed to the effective use of face and edge center predictions, which provide additional geometric cues absent in simpler models.

Implications and Future Directions

H3DNet's ability to handle diverse geometric primitives demonstrates promising potential for enhanced 3D scene understanding applications. By allowing continuous optimization of bounding boxes, it provides high-fidelity object proposals even in cluttered or occluded environments. Practically, this has implications for augmented reality, robotics, and autonomous systems, which require precise object detection within complex scenes.

The theoretical implications suggest that integrating multiple types of geometric primitives could aid other 3D understanding tasks, such as segmentation and CAD model reconstruction. Future work could explore the expansion of the primitive set to include other features like BB corners or utilize this methodology across broader applications.

In conclusion, H3DNet introduces a robust framework for 3D object detection, proving the advantages of employing a hybrid set of geometric constraints, and sets new precedents in the use of geometric deep learning for real-world applications.