DPC-Net: Deep Pose Correction for Visual Localization

Published 10 Sep 2017 in cs.CV | (1709.03128v4)

Abstract: We present a novel method to fuse the power of deep networks with the computational efficiency of geometric and probabilistic localization algorithms. In contrast to other methods that completely replace a classical visual estimator with a deep network, we propose an approach that uses a convolutional neural network to learn difficult-to-model corrections to the estimator from ground-truth training data. To this end, we derive a novel loss function for learning SE(3) corrections based on a matrix Lie groups approach, with a natural formulation for balancing translation and rotation errors. We use this loss to train a Deep Pose Correction network (DPC-Net) that predicts corrections for a particular estimator, sensor and environment. Using the KITTI odometry dataset, we demonstrate significant improvements to the accuracy of a computationally-efficient sparse stereo visual odometry pipeline, that render it as accurate as a modern computationally-intensive dense estimator. Further, we show how DPC-Net can be used to mitigate the effect of poorly calibrated lens distortion parameters.

Abstract PDF Upgrade to Chat

Citations (49)

View on Semantic Scholar

Summary

The paper introduces DPC-Net, a Deep Pose Correction network that enhances traditional visual localization by learning complex, hard-to-model pose corrections.
DPC-Net achieves significant improvements on the KITTI dataset, reducing translational and rotational errors by over 70% compared to uncorrected sparse visual odometry.
This hybrid approach demonstrates robustness to poor sensor calibration and provides an open-source implementation, suggesting broad applicability in navigation and localization.

Overview of DPC-Net: Deep Pose Correction for Visual Localization

The paper by Peretroukhin and Kelly presents an approach to visual localization that combines deep learning with classical geometric techniques to enhance computational efficiency and accuracy. The research introduces DPC-Net, a Deep Pose Correction network designed to augment traditional visual localization systems by providing learned corrections to pose estimates.

Visual odometry (VO), a method for determining camera movement through scenes without external positioning data, is the focal point of this study. Traditional VO systems may suffer from systematic errors due to estimator biases, poor calibration, or environmental factors. This paper proposes a synergistic approach where convolutional neural networks (CNNs) are utilized to learn complex corrections that are hard to model using classical methods alone.

Key Contributions:

Deep Corrective Approach: The paper presents a novel fusion methodology for egomotion estimation which does not replace classical visual estimation with deep learning entirely, but rather, complements it. This approach reduces the computational burden on CNNs which otherwise need to explicitly learn projective geometry, environment context, and sensor calibration.
Loss Function for SE(3) Corrections: A novel loss function based on matrix Lie groups derived for SE(3) corrections is introduced. This loss function naturally balances translation and rotation errors without requiring manual parameter tuning. It leverages the SE(3) geodesic distance to provide robust learning for pose corrections.
Open-Source Implementation: The authors provide an implementation of DPC-Net in PyTorch, enabling replication and further research building on their work.

Technical Insights

The authors propose a convolutional CNN architecture similar to previous deep learning efforts but tailored to regression on SE(3), focusing on the KITTI odometry dataset. Their training protocol employs stereo pairs of images from which the network predicts pose corrections, integrating these with traditional pose estimates using pose graph relaxation techniques.

Numerical Evaluations and Results

The evaluation is performed on stereo image sequences from the KITTI dataset, using a sparse feature-based visual odometry pipeline. DPC-Net demonstrated impressive improvements with full SE(3) corrections, reducing translational m-ATE by 72% and rotational errors by 75% compared with an uncorrected sparse visual odometry technique. Additionally, DPC-Net enhanced accuracy to levels comparable to a dense estimator, which typically demands higher computational resources.

Furthermore, the network showed potential in overcoming the effects of poor lens distortion calibration, affirming its utility in real-world scenarios where perfect sensor calibration is not feasible.

Implications and Future Directions

Peretroukhin and Kelly's work opens several avenues for exploration. The integration of deep learning as a corrective tool rather than a standalone estimator invites the adoption of deep learning methods across various localization and mapping applications. Their approach offers promise in environments where sensor noise and calibration errors are prevalent, hinting at broader applicability including multimodal sensor data fusion.

Future work may involve extending DPC-Net's architecture to incorporate recurrent neural networks for capturing temporal dependencies, adapting it to other sensor types (e.g., lidar), and exploring probabilistic techniques for uncertainty quantification in deep learning predictions.

Overall, the research presents a significant step forward in leveraging deep learning for autonomous navigation and localization, advocating for a hybrid model that capitalizes on strengths from both deep learning and classical geometric estimation methodologies.