CHOSEN: Contrastive Hypothesis Selection for Multi-View Depth Refinement
Abstract: We propose CHOSEN, a simple yet flexible, robust and effective multi-view depth refinement framework. It can be employed in any existing multi-view stereo pipeline, with straightforward generalization capability for different multi-view capture systems such as camera relative positioning and lenses. Given an initial depth estimation, CHOSEN iteratively re-samples and selects the best hypotheses, and automatically adapts to different metric or intrinsic scales determined by the capture system. The key to our approach is the application of contrastive learning in an appropriate solution space and a carefully designed hypothesis feature, based on which positive and negative hypotheses can be effectively distinguished. Integrated in a simple baseline multi-view stereo pipeline, CHOSEN delivers impressive quality in terms of depth and normal accuracy compared to many current deep learning based multi-view stereo pipelines.
- PatchMatch: A randomized correspondence algorithm for structural image editing. ACM Transactions on Graphics (Proc. SIGGRAPH), 28(3), August 2009.
- The fast bilateral solver. In European conference on computer vision, pp. 617–632. Springer, 2016.
- Patchmatch stereo-stereo matching with slanted support windows. In Bmvc, volume 11, pp. 1–11, 2011.
- Mvsformer: Multi-view stereo by learning robust image features and temperature-based depth. Transactions of Machine Learning Research, 2023.
- Mvsformer++: Revealing the devil in transformer’s details for multi-view stereo. arXiv preprint arXiv:2401.11673, 2024.
- Pyramid stereo matching network. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018.
- Deep stereo using adaptive thin volume representation with uncertainty awareness. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2524–2534, 2020.
- Collins, R. T. A space-sweep approach to true multi-image matching. In Proceedings CVPR IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 358–363. IEEE, 1996.
- Transmvsnet: Global context-aware multi-view stereo network with transformers. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8585–8594, 2022.
- Multi-view stereo: A tutorial. Found. Trends Comput. Graph. Vis., 9(1-2):1–148, 2015.
- Accurate, dense, and robust multiview stereopsis. IEEE Trans. Pattern Anal. Mach. Intell., 32(8):1362–1376, 2010.
- Massively parallel multiview stereopsis by surface normal diffusion. In ICCV, 2015.
- Gipuma: Massively parallel multi-view stereo reconstruction. Publikationen der Deutschen Gesellschaft für Photogrammetrie, Fernerkundung und Geoinformation e. V, 25(361-369):2, 2016.
- Cascade cost volume for high-resolution multi-view stereo and stereo matching. In 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2020, Seattle, WA, USA, June 13-19, 2020, pp. 2492–2501. IEEE, 2020.
- Depth map super-resolution by deep multi-scale guidance. In Computer Vision - ECCV 2016 - 14th European Conference, Amsterdam, The Netherlands, October 11-14, 2016, Proceedings, Part III, volume 9907 of Lecture Notes in Computer Science, pp. 353–369. Springer, 2016.
- Unsupervised learning of shape and pose with differentiable point clouds. In Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, NeurIPS 2018, December 3-8, 2018, Montréal, Canada, pp. 2807–2817, 2018.
- Large scale multi-view stereopsis evaluation. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 406–413, 2014.
- Surfacenet: An end-to-end 3d neural network for multiview stereopsis. In IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, pp. 2326–2334. IEEE Computer Society, 2017.
- Learning a multi-view stereo machine. In Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, December 4-9, 2017, Long Beach, CA, USA, pp. 365–376, 2017.
- End-to-end learning of geometry and context for deep stereo regression. In IEEE International Conference on Computer Vision (ICCV), 2017.
- StereoNet: Guided hierarchical refinement for edge-aware depth prediction. In European Conference on Computer Vision (ECCV), 2018.
- Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
- Tanks and temples: Benchmarking large-scale scene reconstruction. ACM Transactions on Graphics (ToG), 36(4):1–13, 2017.
- Deepc-mvs: Deep confidence prediction for multi-view stereo reconstruction. In 2020 International Conference on 3D Vision (3DV), pp. 404–413. Ieee, 2020.
- A theory of shape by space carving. Int. J. Comput. Vis., 38(3):199–218, 2000. doi: 10.1023/A:1008191222954. URL https://doi.org/10.1023/A:1008191222954.
- A quasi-dense approach to surface reconstruction from uncalibrated images. IEEE Trans. Pattern Anal. Mach. Intell., 27(3):418–433, 2005.
- Learning efficient point cloud generation for dense 3d object reconstruction. pp. 7114–7121. AAAI Press, 2018.
- Generalized binary search network for highly-efficient multi-view stereo. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022.
- Dinov2: Learning robust visual features without supervision. arXiv preprint arXiv:2304.07193, 2023.
- Cascade residual learning: A two-stage convolutional neural network for stereo matching. In International Conference on Computer Vision-Workshop on Geometry Meets Deep Learning (ICCVW 2017), 2017.
- Rethinking depth estimation for multi-view stereo: A unified representation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8645–8654, 2022.
- U-net: Convolutional networks for biomedical image segmentation. MICCAI, 2015.
- A taxonomy and evaluation of dense two-frame stereo correspondence algorithms. International journal of computer vision, 2002.
- Pixelwise view selection for unstructured multi-view stereo. In Computer Vision - ECCV 2016 - 14th European Conference, Amsterdam, The Netherlands, October 11-14, 2016, Proceedings, Part III, volume 9907 of Lecture Notes in Computer Science, pp. 501–518. Springer, 2016.
- A multi-view stereo benchmark with high-resolution images and multi-camera videos. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3260–3269, 2017.
- Continuous stereo matching using local expansion moves. arXiv preprint arXiv:1603.08328.
- Hitnet: Hierarchical iterative tile refinement network for real-time stereo matching. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 14362–14372, 2021.
- Raft: Recurrent all-pairs field transforms for optical flow. In European conference on computer vision, pp. 402–419. Springer, 2020.
- Efficient large-scale multi-view stereo for ultra high-resolution image sets. Mach. Vis. Appl., 23(5):903–920, 2012.
- Patchmatchnet: Learned multi-view patchmatch stereo. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 14194–14203, 2021.
- Itermvs: Iterative probability estimation for efficient multi-view stereo, 2022.
- Multiface: A dataset for neural face rendering. In arXiv, 2022. doi: 10.48550/ARXIV.2207.11243. URL https://arxiv.org/abs/2207.11243.
- Pvsnet: Pixelwise visibility-aware multi-view stereo network. CoRR, abs/2007.07714, 2020. URL https://arxiv.org/abs/2007.07714.
- Mvsnet: Depth inference for unstructured multi-view stereo. In Proceedings of the European Conference on Computer Vision (ECCV), pp. 767–783, 2018.
- Recurrent mvsnet for high-resolution multi-view stereo depth inference. In IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2019, Long Beach, CA, USA, June 16-20, 2019, pp. 5525–5534, 2019.
- Blendedmvs: A large-scale dataset for generalized multi-view stereo networks. Computer Vision and Pattern Recognition (CVPR), 2020.
- Fast-mvsnet: Sparse-to-dense multi-view stereo with learned propagation and gauss-newton refinement. In 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2020, Seattle, WA, USA, June 13-19, 2020, pp. 1946–1955. IEEE, 2020.
- Ners: Neural reflectance surfaces for sparse-view 3d reconstruction in the wild. Advances in Neural Information Processing Systems, 34:29835–29847, 2021.
- Vis-mvsnet: Visibility-aware multi-view stereo network. International Journal of Computer Vision, 131(1):199–214, 2023a.
- ActiveStereoNet: End-to-end self-supervised learning for active stereo systems. European Conference on Computer Vision (ECCV), 2018.
- Geomvsnet: Learning multi-view stereo with geometry perception. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 21508–21518, 2023b.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.