Papers
Topics
Authors
Recent
Search
2000 character limit reached

Learning a Depth Covariance Function

Published 21 Mar 2023 in cs.CV, cs.LG, and cs.RO | (2303.12157v2)

Abstract: We propose learning a depth covariance function with applications to geometric vision tasks. Given RGB images as input, the covariance function can be flexibly used to define priors over depth functions, predictive distributions given observations, and methods for active point selection. We leverage these techniques for a selection of downstream tasks: depth completion, bundle adjustment, and monocular dense visual odometry.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (52)
  1. IronDepth: Iterative refinement of single-view depth using surface normal and its uncertainty. In Proceedings of the British Machine Vision Conference (BMVC), 2022.
  2. Understanding probabilistic sparse Gaussian process approximations. In Neural Information Processing Systems (NeurIPS), 2016.
  3. CodeSLAM - learning a compact, optimisable representation for dense visual SLAM. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018.
  4. Learning meshes for dense visual SLAM. In Proceedings of the International Conference on Computer Vision (ICCV), 2019.
  5. AK: Attentive kernel for information gathering. In Proceedings of Robotics: Science and Systems (RSS), 2022.
  6. Depth estimation via affinity learned with convolutional spatial propagation network. In Proceedings of the European Conference on Computer Vision (ECCV), 2018.
  7. Inverse depth parametrization for monocular SLAM. IEEE Transactions on Robotics (T-RO), 2008.
  8. Sparsity agnostic depth completion. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), 2023.
  9. DeepFactors: Real-time probabilistic dense monocular SLAM. IEEE Robotics and Automation Letters (RA-L), 2020.
  10. ScanNet: Richly-annotated 3D reconstructions of indoor scenes. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017.
  11. Structured uncertainty prediction networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018.
  12. Depth map prediction from a single image using a multi-scale deep network. In Neural Information Processing Systems (NeurIPS), 2014.
  13. LSD-SLAM: Large-scale direct monocular SLAM. In Proceedings of the European Conference on Computer Vision (ECCV), 2014.
  14. borglab/gtsam, May 2022.
  15. Digging into self-supervised monocular depth estimation. In Proceedings of the International Conference on Computer Vision (ICCV), 2019.
  16. RidgeSfM: Structure from motion via robust pairwise matching under depth uncertainty. In Proceedings of the International Conference on 3D Vision (3DV), 2020.
  17. Near-optimal sensor placements in Gaussian processes. In Proceedings of the International Conference on Machine Learning (ICML), 2005.
  18. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016.
  19. Probability product kernels. The Journal of Machine Learning Research, 2004.
  20. What uncertainties do we need in Bayesian deep learning for computer vision? In Neural Information Processing Systems (NeurIPS), 2017.
  21. Adam: A method for stochastic optimization. In Proceedings of the International Conference on Learning Representations (ICLR), 2015.
  22. Auto-encoding variational Bayes. In Proceedings of the International Conference on Learning Representations (ICLR), 2014.
  23. Accurate uncertainties for deep learning using calibrated regression. In Proceedings of the International Conference on Machine Learning (ICML), 2018.
  24. Adaptive non-stationary kernel regression for terrain modeling. In Proceedings of Robotics: Science and Systems (RSS), 2007.
  25. Learning steering kernels for guided depth completion. IEEE Transactions on Image Processing, 2021.
  26. An iterative image registration technique with an application to stereo vision. In Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI), 1981.
  27. Sparse-to-dense: Depth prediction from sparse depth samples and a single image. In Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), 2018.
  28. ORB-SLAM2: An open-source SLAM system for monocular, stereo, and RGB-D cameras. IEEE Transactions on Robotics (T-RO), 2017.
  29. DTAM: Dense tracking and mapping in real-time. In Proceedings of the International Conference on Computer Vision (ICCV), 2011.
  30. The promises and pitfalls of deep kernel learning. In Conference on Uncertainty in Artificial Intelligence, 2021.
  31. Nonstationary covariance functions for Gaussian process regression. In Neural Information Processing Systems (NeurIPS), 2003.
  32. Non-local spatial propagation network for depth completion. In Proceedings of the European Conference on Computer Vision (ECCV), 2020.
  33. Bayesian meta-learning for the few-shot setting via deep kernels. In Neural Information Processing Systems (NeurIPS), 2020.
  34. Towards robust monocular depth estimation: Mixing datasets for zero-shot cross-dataset transfer. IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI), 2022.
  35. Online sparse Gaussian process regression and its applications. IEEE Transactions on Image Processing, 2011.
  36. Carl Edward Rasmussen and Christopher K. I. Williams. Gaussian Processes for Machine Learning. MIT Press, 2005.
  37. Jean-Michel M. Rendu. Normal and lognormal estimation. Journal of the International Association for Mathematical Geology, 1979.
  38. U-Net: Convolutional networks for biomedical image segmentation. In Proceedings of the International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI), 2015.
  39. Structure-from-motion revisited. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016.
  40. Good features to track. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 1994.
  41. Indoor segmentation and support inference from RGBD images. In Proceedings of the European Conference on Computer Vision (ECCV), 2012.
  42. Learning structured Gaussians to approximate deep ensembles. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2022.
  43. A benchmark for the evaluation of RGB-D SLAM systems. In Proceedings of the IEEE/RSJ Conference on Intelligent Robots and Systems (IROS), 2012.
  44. BA-Net: Dense bundle adjustment networks. In Proceedings of the International Conference on Learning Representations (ICLR), 2019.
  45. DeepV2D: Video to depth with differentiable structure from motion. In Proceedings of the International Conference on Learning Representations (ICLR), 2020.
  46. DROID-SLAM: Deep visual SLAM for monocular, stereo, and RGB-D cameras. In Neural Information Processing Systems (NeurIPS), 2021.
  47. Michalis Titsias. Variational learning of inducing variables in sparse Gaussian processes. In International Conference on Artificial Intelligence and Statistics (AISTATS), 2009.
  48. TartanVO: A generalizable learning-based VO. In Conference on Robot Learning (CoRL), 2021.
  49. Deep kernel learning. In International Conference on Artificial Intelligence and Statistics (AISTATS), 2016.
  50. Group normalization. In Proceedings of the European Conference on Computer Vision (ECCV), September 2018.
  51. RigNet: Repetitive image guided network for depth completion. In Proceedings of the European Conference on Computer Vision (ECCV), 2022.
  52. Balanced depth completion between dense depth inference and sparse range measurements via KISS-GP. In Proceedings of the IEEE/RSJ Conference on Intelligent Robots and Systems (IROS), 2020.
Citations (12)

Summary

  • The paper introduces a depth covariance function that leverages RGB images and Gaussian Process modeling to improve depth prediction.
  • It employs a nonstationary kernel and a log-depth representation to mitigate scale ambiguity and capture local scene details.
  • The framework demonstrates enhanced performance in tasks such as depth completion, bundle adjustment, and monocular dense visual odometry through active sampling and variational optimization.

Critical Examination of "Learning a Depth Covariance Function"

This paper presents the innovative concept of learning a depth covariance function aimed at enhancing geometric vision tasks by leveraging the natural correlation between image pixels to inform depth predictions. The proposed framework elegantly combines principles from multiple view geometry with data-driven approaches, potentially addressing challenges prevalent in traditional methods that often suffer from inconsistencies when fusing 2D image information into 3D space.

Core Contributions and Methodology

The primary contribution of the paper is the introduction of a depth covariance function that uses RGB images to predict the correlation in depth between pixel pairs. This is achieved by transforming image features into a Gaussian Process (GP) framework, which employs a neural network to map image data to feature space, coupled with a base kernel function to establish a prior distribution over depth. The modular separation between image data processing and the GP prior allows for a model that is responsive to scene complexity, thus promoting locality and offering flexibility during test conditions.

Key components of the proposed methodology include:

  • Depth Representation: Use of log-depth as the representation allows for errors to scale based on distance, emphasizing local structures and directly managing the scale ambiguity in monocular images.
  • Covariance Function: The paper adopts a nonstationary kernel that, by leveraging locality, enables the covariance function to avoid over-correlation across independent structures, thus enabling more accurate depth prediction even in complex 3D environments.
  • Optimization Objective: Training leverages a variational free energy approach that approximates the GP's computationally intensive operations using a Nyström approximation, facilitating practical application at scale.

Applications and Performance

The applicability of the depth covariance function is illustrated through three geometric vision tasks: depth completion, bundle adjustment, and monocular dense visual odometry (DVO). Across these tasks, the model demonstrates notable efficacy:

  1. Depth Completion: The GP-based approach outperforms several existing techniques by achieving lower RMSE and demonstrating well-calibrated depth uncertainties.
  2. Bundle Adjustment: By including the depth covariance in small baseline scenarios, improved consistency and coherence in geometry estimation were observed when compared to traditional approaches.
  3. Monocular Dense Visual Odometry: The integration of depth priors facilitated accurate pose estimation and depth prediction, showcasing superior performance on established benchmarks.

The active sampling framework further optimizes depth map construction by selectively querying the most informative pixels, thus minimizing predictive variance. This adaptability is crucial for handling complex scenes efficiently.

Implications and Future Directions

This paper positions the depth covariance function as a versatile tool that enhances geometric vision tasks by balancing learning-based priors and test-time optimization, allowing the system to scale its complexity according to the scene's demand. The decoupled architecture also suggests compatibility with different geometric representations, potentially setting a new standard for depth prediction in computer vision.

Future exploration could extend the approach by considering alternative kernel functions to further capture geometric complexity, as well as scaling the method to handle higher resolution imagery efficiently. The connection between probabilistic modeling and deep learning underlying this work opens up avenues for integrating advanced Bayesian methods into 3D vision tasks.

In conclusion, the paper offers a compelling framework to improve depth estimation tasks, pertinent for advancing state-of-the-art in both academic research and practical applications in fields such as robotics, augmented reality, and autonomous systems.

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.