Robust Synthetic-to-Real Transfer for Stereo Matching
Abstract: With advancements in domain generalized stereo matching networks, models pre-trained on synthetic data demonstrate strong robustness to unseen domains. However, few studies have investigated the robustness after fine-tuning them in real-world scenarios, during which the domain generalization ability can be seriously degraded. In this paper, we explore fine-tuning stereo matching networks without compromising their robustness to unseen domains. Our motivation stems from comparing Ground Truth (GT) versus Pseudo Label (PL) for fine-tuning: GT degrades, but PL preserves the domain generalization ability. Empirically, we find the difference between GT and PL implies valuable information that can regularize networks during fine-tuning. We also propose a framework to utilize this difference for fine-tuning, consisting of a frozen Teacher, an exponential moving average (EMA) Teacher, and a Student network. The core idea is to utilize the EMA Teacher to measure what the Student has learned and dynamically improve GT and PL for fine-tuning. We integrate our framework with state-of-the-art networks and evaluate its effectiveness on several real-world datasets. Extensive experiments show that our method effectively preserves the domain generalization ability during fine-tuning.
- There are many consistent explanations of unlabeled data: Why you should average. arXiv preprint arXiv:1806.05594, 2018.
- Matching-space stereo networks for cross-domain generalization. In 2020 International Conference on 3D Vision (3DV), pages 364–373. IEEE, 2020.
- Pyramid stereo matching network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 5410–5418, 2018.
- Domain generalized stereo matching via hierarchical visual transformation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 9559–9568, 2023.
- Learning depth with convolutional spatial propagation network. IEEE transactions on pattern analysis and machine intelligence, 42(10):2361–2379, 2019.
- Hierarchical neural architecture search for deep stereo matching. Advances in Neural Information Processing Systems, 33:22158–22169, 2020.
- Itsa: An information-theoretic approach to automatic shortcut avoidance and domain generalization in stereo matching networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 13022–13032, 2022.
- Learning depth estimation for transparent and mirror surfaces. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 9244–9255, 2023.
- Born again neural networks. In International Conference on Machine Learning, pages 1607–1616. PMLR, 2018.
- Are we ready for autonomous driving? the kitti vision benchmark suite. In 2012 IEEE conference on computer vision and pattern recognition, pages 3354–3361. IEEE, 2012.
- Knowledge distillation: A survey. International Journal of Computer Vision, 129:1789–1819, 2021.
- Group-wise correlation stereo network. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 3273–3282, 2019.
- Momentum contrast for unsupervised visual representation learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 9729–9738, 2020.
- Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531, 2015.
- Heiko Hirschmuller. Accurate and efficient stereo processing by semi-global matching and mutual information. In 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05), pages 807–814. IEEE, 2005.
- End-to-end learning of geometry and context for deep stereo regression. In Proceedings of the IEEE International Conference on Computer Vision, pages 66–75, 2017.
- Domain generalization with adversarial feature learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 5400–5409, 2018.
- Practical stereo matching via cascaded recurrent network with adaptive correlation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 16263–16272, 2022.
- Learning for disparity estimation through feature constancy. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 2811–2820, 2018.
- Stereo matching using multi-level cost volume and multi-scale feature constancy. IEEE transactions on pattern analysis and machine intelligence, 43(1):300–315, 2019.
- Raft-stereo: Multilevel recurrent field transforms for stereo matching. In 2021 International Conference on 3D Vision (3DV), pages 218–227. IEEE, 2021.
- A large dataset to train convolutional networks for disparity, optical flow, and scene flow estimation. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 4040–4048, 2016.
- Object scene flow for autonomous vehicles. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 3061–3070, 2015.
- Continual adaptation for deep stereo. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2021.
- Open challenges in deep stereo: the booster dataset. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 21168–21178, 2022.
- Masked representation learning for domain generalized stereo matching. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 5435–5444, 2023.
- Deep learning, dark knowledge, and dark matter. In NIPS 2014 Workshop on High-energy Physics and Machine Learning, pages 81–87. PMLR, 2015.
- High-resolution stereo datasets with subpixel-accurate ground truth. In German conference on pattern recognition, pages 31–42. Springer, 2014.
- A multi-view stereo benchmark with high-resolution images and multi-camera videos. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 3260–3269, 2017.
- Cfnet: Cascade and fused cost volume for robust stereo matching. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 13906–13915, 2021.
- Pcw-net: Pyramid combination and warping cost volume for stereo matching. In European Conference on Computer Vision, pages 280–297. Springer, 2022.
- Digging into uncertainty-based pseudo-label for robust stereo matching. arXiv preprint arXiv:2307.16509, 2023.
- Weight-averaged consistency targets improve semi-supervised deep learning results. corr. abs/1703.01780 (2017). arXiv preprint arXiv:1703.01780, 2017.
- Raft: Recurrent all-pairs field transforms for optical flow. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part II 16, pages 402–419. Springer, 2020.
- Real-time self-adaptive deep stereo. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 195–204, 2019.
- Nerf-supervised deep stereo. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 855–866, 2023.
- Pvstereo: Pyramid voting module for end-to-end self-supervised stereo matching. IEEE Robotics and Automation Letters, 6(3):4353–4360, 2021.
- Improved cross-view completion pre-training for stereo matching. arXiv preprint arXiv:2211.10408, 2022.
- Attention concatenation volume for accurate and efficient stereo matching. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 12981–12990, 2022.
- Iterative geometry encoding volume for stereo matching. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 21919–21928, 2023a.
- Cgi-stereo: Accurate and real-time stereo matching via context and geometry interaction. arXiv preprint arXiv:2301.02789, 2023b.
- Aanet: Adaptive aggregation network for efficient stereo matching. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 1959–1968, 2020.
- Hierarchical deep stereo matching on high-resolution images. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 5515–5524, 2019a.
- Drivingstereo: A large-scale dataset for stereo matching in autonomous driving scenarios. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 899–908, 2019b.
- Multi-label knowledge distillation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 17271–17280, 2023.
- Hierarchical discrete distribution decomposition for match density estimation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 6044–6053, 2019.
- Revisiting knowledge distillation via label smoothing regularization. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 3903–3911, 2020.
- Computing the stereo matching cost with a convolutional neural network. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 1592–1599, 2015.
- Parameterized cost volume for stereo matching. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 18347–18357, 2023.
- Ga-net: Guided aggregation net for end-to-end stereo matching. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 185–194, 2019.
- Domain-invariant stereo matching networks. In European Conference on Computer Vision, pages 420–439. Springer, 2020a.
- Revisiting domain generalized stereo matching networks from a feature consistency perspective. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 13001–13011, 2022.
- Adaptive unimodal cost volume filtering for deep stereo matching. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 12926–12934, 2020b.
- Decoupled knowledge distillation. In Proceedings of the IEEE/CVF Conference on computer vision and pattern recognition, pages 11953–11962, 2022.
- High-frequency stereo matching network. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 1327–1336, 2023.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.