HccePose(BF): Predicting Front & Back Surfaces to Construct Ultra-Dense 2D-3D Correspondences for Pose Estimation

Published 11 Oct 2025 in cs.CV and cs.AI | (2510.10177v2)

Abstract: In pose estimation for seen objects, a prevalent pipeline involves using neural networks to predict dense 3D coordinates of the object surface on 2D images, which are then used to establish dense 2D-3D correspondences. However, current methods primarily focus on more efficient encoding techniques to improve the precision of predicted 3D coordinates on the object's front surface, overlooking the potential benefits of incorporating the back surface and interior of the object. To better utilize the full surface and interior of the object, this study predicts 3D coordinates of both the object's front and back surfaces and densely samples 3D coordinates between them. This process creates ultra-dense 2D-3D correspondences, effectively enhancing pose estimation accuracy based on the Perspective-n-Point (PnP) algorithm. Additionally, we propose Hierarchical Continuous Coordinate Encoding (HCCE) to provide a more accurate and efficient representation of front and back surface coordinates. Experimental results show that, compared to existing state-of-the-art (SOTA) methods on the BOP website, the proposed approach outperforms across seven classic BOP core datasets. Code is available at https://github.com/WangYuLin-SEU/HCCEPose.

Abstract PDF Upgrade to Chat

Summary

The paper introduces Hierarchical Continuous Coordinate Encoding (HCCE) to predict both front and back surfaces, enhancing dense 2D-3D correspondences for pose estimation.
It constructs ultra-dense correspondences by sampling intermediate points between surfaces, which improves the accuracy of the RANSAC-PnP solver.
Experimental results on BOP core datasets demonstrate that integrating back surface prediction significantly boosts pose estimation accuracy in both RGB and RGB-D scenarios.

Overview of HccePose(BF)

"HccePose(BF): Predicting Front & Back Surfaces to Construct Ultra-Dense 2D-3D Correspondences for Pose Estimation" presents an innovative approach to object pose estimation by leveraging the prediction of both the front and back surfaces of an object. Through the introduction of Hierarchical Continuous Coordinate Encoding (HCCE), the paper aims to enhance the accuracy of pose estimation in various computer vision applications, particularly when using the Perspective-n-Point (PnP) algorithm. By addressing limitations in current methodologies that focus predominantly on the front surface, the study provides critical insights and improvements to existing dense correspondence techniques.

Methodology

Ultra-Dense 2D-3D Correspondences

The key innovation lies in the prediction of both the object's front ( $\tilde{Q}_f$ ) and back ( $\tilde{Q}_b$ ) surface 3D coordinates, along with the dense sampling of intermediate points ( $\tilde{Q}_m$ ) between these surfaces. This approach significantly increases the density of 2D-3D correspondences, which are crucial for accurate pose estimation using the RANSAC-PnP solver. By ensuring that each RANSAC-PnP iteration samples a diverse set of 3D points per 2D projection, the pose estimation's reliability is enhanced.

Hierarchical Continuous Coordinate Encoding (HCCE)

HCCE improves the representation and efficiency of surface coordinate encoding by adopting a hierarchical, continuous approach rather than using traditional binary encoding. Through encoding surface coordinate components (x, y, z) as multi-level continuous codes, HCCE enables neural networks to learn these representations more effectively. This method incorporates a hierarchical mirroring technique to maintain smooth transitions across encoding levels, thus facilitating seamless network training.

Loss Function and Hierarchical Learning

The proposed loss function combines mask and hierarchical losses, with the latter focusing on the accurate prediction of hierarchical continuous codes. By employing multiple histograms to dynamically adjust weights across various hierarchical levels, the method boosts training stability and precision. This approach contrasts with traditional single-histogram strategies, providing improved learning granularity and ultimately better performance.

Experimental Results and Comparisons

Experiments demonstrate the efficacy of HccePose(BF) on various BOP core datasets, including LM-O, T-LESS, and ITODD, among others. The method outperforms state-of-the-art approaches in both RGB and RGB-D conditions, underscoring its robust accuracy in 6D localization tasks. Furthermore, the introduction of ultra-dense 2D-3D correspondences yields a significant performance boost, particularly when the back surface information is incorporated, leading to enhanced pose estimation capabilities.

Conclusion

HccePose(BF) significantly enhances object pose estimation by refining dense 2D-3D correspondence construction through simultaneous front and back surface predictions. The innovative use of HCCE offers notable improvements in encoding accuracy and system stability, making it a valuable advancement in the field. Future research opportunities may focus on extending this approach to unseen object categories or integrating additional sensory data to further improve pose estimation precision and generalizability.

Markdown Report Issue