P2B: Point-to-Box Network for 3D Object Tracking in Point Clouds

Published 28 May 2020 in cs.CV | (2005.13888v1)

Abstract: Towards 3D object tracking in point clouds, a novel point-to-box network termed P2B is proposed in an end-to-end learning manner. Our main idea is to first localize potential target centers in 3D search area embedded with target information. Then point-driven 3D target proposal and verification are executed jointly. In this way, the time-consuming 3D exhaustive search can be avoided. Specifically, we first sample seeds from the point clouds in template and search area respectively. Then, we execute permutation-invariant feature augmentation to embed target clues from template into search area seeds and represent them with target-specific features. Consequently, the augmented search area seeds regress the potential target centers via Hough voting. The centers are further strengthened with seed-wise targetness scores. Finally, each center clusters its neighbors to leverage the ensemble power for joint 3D target proposal and verification. We apply PointNet++ as our backbone and experiments on KITTI tracking dataset demonstrate P2B's superiority (~10%'s improvement over state-of-the-art). Note that P2B can run with 40FPS on a single NVIDIA 1080Ti GPU. Our code and model are available at https://github.com/HaozheQi/P2B.

Abstract PDF Upgrade to Chat

Citations (131)

View on Semantic Scholar

Summary

The paper introduces a point-to-box network that leverages target-specific feature augmentation and Hough voting to achieve end-to-end 3D tracking.
It reports a ~10% improvement in Success and Precision metrics on the KITTI dataset while operating at 40 FPS on a single NVIDIA 1080Ti GPU.
The approach advances real-time object tracking in autonomous driving and robotics by effectively handling the sparsity and disorder in point cloud data.

Point-to-Box Network for 3D Object Tracking in Point Clouds: A Technical Assessment

The paper introduces a novel methodology for 3D object tracking in point clouds, focusing on its application in autonomous driving and robotics. This is achieved through the design and implementation of a Point-to-Box (P2B) network, which offers an end-to-end trainable architecture to perform object tracking using only sparse and unordered point cloud data.

Core Methodology

The P2B network distinguishes itself through two principal components: target-specific feature augmentation and joint 3D target proposal and verification. Initially, the architecture employs a backbone, based on PointNet++, to generate point-wise features and seeds from both template and search areas. The innovation lies in augmenting the search area seeds with target-specific features from the template, achieved by calculating point-wise similarities that respect the permutation-invariance characteristic of point clouds. This is facilitated by Hough voting, which allows seeds to contribute towards potential target center proposals, effectively circumventing 3D exhaustive search constraints.

A pivotal aspect detailed in the paper is the integration of seed-wise targetness scores, which serve dual purposes: enhancing earlier feature learning processes and fortifying the discriminatory power during potential target center identification. This is critical in refining the quality and accuracy of the P2B's target proposals.

Experimental Results and Contributions

The research undertakes comprehensive testing using the KITTI tracking dataset, renowned for its challenges in autonomous driving contexts due to point sparsity and object occlusion scenarios. By employing the Success and Precision metrics under the One Pass Evaluation (OPE) protocol, the P2B network obtained notable improvements over the state-of-the-art results, with the experiments demonstrating a $\sim$ 10\% enhancement over previous methods in both metrics.

An impressive operational speed was recorded at 40 frames per second on a single NVIDIA 1080Ti GPU, signifying the method's practicality in high-demand real-time applications.

Implications and Future Prospects

The practical implications of the P2B network extend primarily to the field of autonomous vehicles and robotics, where robust and efficient 3D object tracking is paramount. The network's ability to handle sparsity and disorder in point cloud data positions it as a potential alternative to traditionally RGB-D reliant methods, which suffer in low-visibility environments.

Theoretically, the end-to-end trainable framework proposed by P2B establishes a foundation for future research in extending deep learning models' applicability to inherently unordered data, such as point clouds. Key future directions may involve increasing robustness against the initial point cloud's sparseness, exploring further generalization capacities, and addressing data dependency issues highlighted in the experiments.

Additionally, the architectural approach suggests potential adaptations in hybrid models that might integrate complementary data modalities to further bolster performance in challenging tracking scenarios. Subsequent research could explore optimizing the target-specific feature augmentation's architecture, seeking permutations that enhance yet further the model's tracking preciseness and adaptability.

In summary, the P2B network represents a significant stride in 3D object tracking by leveraging deep learning paradigms tailored for point cloud data, proving especially suitable for applications in autonomous systems where real-time processing and reliability are crucial.

Markdown Report Issue