Anytime Stereo Image Depth Estimation on Mobile Devices

Published 26 Oct 2018 in cs.CV | (1810.11408v2)

Abstract: Many applications of stereo depth estimation in robotics require the generation of accurate disparity maps in real time under significant computational constraints. Current state-of-the-art algorithms force a choice between either generating accurate mappings at a slow pace, or quickly generating inaccurate ones, and additionally these methods typically require far too many parameters to be usable on power- or memory-constrained devices. Motivated by these shortcomings, we propose a novel approach for disparity prediction in the anytime setting. In contrast to prior work, our end-to-end learned approach can trade off computation and accuracy at inference time. Depth estimation is performed in stages, during which the model can be queried at any time to output its current best estimate. Our final model can process 1242$ \times $375 resolution images within a range of 10-35 FPS on an NVIDIA Jetson TX2 module with only marginal increases in error -- using two orders of magnitude fewer parameters than the most competitive baseline. The source code is available at https://github.com/mileyan/AnyNet .

Abstract PDF Upgrade to Chat

Authors (7)

Citations (173)

View on Semantic Scholar

Summary

The paper introduces a novel anytime approach that refines disparity maps progressively, balancing speed and accuracy for real-time tasks.
It employs a U-Net architecture with residual predictions and a Spatial Propagation Network to optimize efficiency on power-limited systems.
Empirical results on KITTI benchmarks show competitive accuracy at 10-35 FPS with significantly fewer parameters than traditional methods.

Anytime Stereo Image Depth Estimation on Mobile Devices: An Expert Overview

The paper "Anytime Stereo Image Depth Estimation on Mobile Devices" addresses the critical challenge of real-time disparity map generation under computational constraints, particularly on power- and memory-limited mobile devices. The study introduces the Anytime Stereo Network (AnyNet), an innovative approach that balances speed and accuracy dynamically, a notable improvement over existing methodologies that often require fixed computational resources for accuracy.

Core Contributions

The primary contribution of the paper is the development of an anytime computational approach to stereo image depth estimation. The proposed AnyNet model differentiates itself by allowing real-time querying at any stage of computation, which facilitates rapid initial predictions for urgent tasks such as obstacle avoidance in autonomous robots, while enabling more detailed calculations when time permits. This flexible approach is achieved through a novel model architecture that performs depth estimation in stages, refining disparity maps progressively.

Technical Overview

1. Network Architecture:

The AnyNet model employs a U-Net architecture for feature extraction, generating multi-resolution feature maps. The disparity estimation process begins at a low-resolution scale and refines this estimation by progressively up-sampling and correcting initial predictions through residual prediction.

Residual Prediction: Key to AnyNet’s efficiency is its use of residual predictions at higher resolutions, which minimizes computational load by restricting disparity searches to a narrow range.
Spatial Propagation Network (SPNet): A final enhancement is achieved through SPNet, which refines predictions further, enhancing the accuracy particularly in higher latency scenarios.

2. Computational Efficiency:

The model demonstrates substantial improvements in computational efficiency, achieving processing speeds between 10-35 FPS on an NVIDIA Jetson TX2 module. Notably, it operates with two orders of magnitude fewer parameters than the most competitive baseline, making it highly suitable for mobile applications.

Evaluation and Results

The empirical evaluation covers several benchmarks, including KITTI-2012 and KITTI-2015 datasets. AnyNet exhibits competitive accuracy relative to state-of-the-art approaches while maintaining resource efficiency. In comparison to traditional methods, AnyNet achieves superior results within the real-time processing constraints. The proposed system is especially advantageous in embedded systems, where the computational budget is restricted.

Implications and Future Directions

This research carries significant implications for robotics and mobile computer vision applications. The ability to adaptively balance computation and accuracy opens pathways for more responsive and context-aware systems, such as drones and autonomous vehicles operating in dynamic environments.

Future Developments:

Given its efficiency, AnyNet could potentially be adapted for use in other vision-related tasks. Future work might explore further optimization of the network for lower power consumption or adaptation to more complex environments and larger disparity scales. Additionally, integration with other sensor modalities like LiDAR might enhance depth estimation accuracy in challenging conditions.

In conclusion, the Anytime Stereo Network is positioned as a viable and efficient solution for real-time depth estimation on mobile devices, and it represents a valuable step forward in the field of computational depth estimation.

Markdown Report Issue