Papers
Topics
Authors
Recent
Search
2000 character limit reached

UWStereo: A Large Synthetic Dataset for Underwater Stereo Matching

Published 3 Sep 2024 in cs.CV | (2409.01782v1)

Abstract: Despite recent advances in stereo matching, the extension to intricate underwater settings remains unexplored, primarily owing to: 1) the reduced visibility, low contrast, and other adverse effects of underwater images; 2) the difficulty in obtaining ground truth data for training deep learning models, i.e. simultaneously capturing an image and estimating its corresponding pixel-wise depth information in underwater environments. To enable further advance in underwater stereo matching, we introduce a large synthetic dataset called UWStereo. Our dataset includes 29,568 synthetic stereo image pairs with dense and accurate disparity annotations for left view. We design four distinct underwater scenes filled with diverse objects such as corals, ships and robots. We also induce additional variations in camera model, lighting, and environmental effects. In comparison with existing underwater datasets, UWStereo is superior in terms of scale, variation, annotation, and photo-realistic image quality. To substantiate the efficacy of the UWStereo dataset, we undertake a comprehensive evaluation compared with nine state-of-the-art algorithms as benchmarks. The results indicate that current models still struggle to generalize to new domains. Hence, we design a new strategy that learns to reconstruct cross domain masked images before stereo matching training and integrate a cross view attention enhancement module that aggregates long-range content information to enhance the generalization ability.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (58)
  1. M. Stoiber, M. Sundermeyer, and R. Triebel, “Iterative corresponding geometry: Fusing region and depth for highly efficient 3d tracking of textureless objects,” in CVPR, June 2022, pp. 6855–6865.
  2. H. Lee and J. Park, “Instance-Wise Occlusion and Depth Orders in Natural Scenes,” in CVPR, 2022, pp. 21 210–21 221.
  3. J.-R. Chang and Y.-S. Chen, “Pyramid Stereo Matching Network,” in CVPR, 2018, pp. 5410–5418.
  4. L. Lipson, Z. Teed, and J. Deng, “Raft-Stereo: Multilevel Recurrent Field Transforms for Stereo Matching,” in 3DV, 2021, pp. 218–227.
  5. X. Gangwei, X. Wang, X. Ding, and X. Yang, “Iterative Geometry Encoding Volume for Stereo Matching,” in CVPR, 2023.
  6. F. Wang, S. Galliani, C. Vogel, and M. Pollefeys, “Itermvs: Iterative Probability Estimation for Efficient Multi-View Stereo,” in CVPR, 2022, pp. 8606–8615.
  7. Z. Zhang, R. Peng, Y. Hu, and R. Wang, “Geomvsnet: Learning Multi-View Stereo with Geometry Perception,” in CVPR, 2023.
  8. H. Wang, J. Wang, and L. Agapito, “Co-SLAM: Joint Coordinate and Sparse Parametric Encodings for Neural Real-Time SLAM,” in CVPR, 2023.
  9. Y. Ren, F. Wang, T. Zhang, M. Pollefeys, and S. Süsstrunk, “Volrecon: Volume Rendering of Signed Ray Distance Functions for Generalizable Multi-View Reconstruction,” in CVPR, 2023.
  10. G. Xu, J. Cheng, P. Guo, and X. Yang, “Attention Concatenation Volume for Accurate and Efficient Stereo Matching,” in CVPR, 2022, pp. 12 981–12 990.
  11. H. Xu and J. Zhang, “Aanet: Adaptive Aggregation Network for Efficient Stereo Matching,” in CVPR, 2020, pp. 1956–1965.
  12. X. Guo, K. Yang, W. Yang, X. Wang, and H. Li, “Group-Wise Correlation Stereo Network,” in CVPR, 2019, pp. 3273–3282.
  13. H. Zhao, Z. Huizhou, Y. Zhang, J. Chen, Y. Yang, and Y. Zhao, “High-frequency Stereo Matching Network,” in CVPR, 2023.
  14. J. Li, P. Wang, P. Xiong, T. Cai, Z. Yan, L. Yang, J. Liu, H. Fan, and S. Liu, “Practical Stereo Matching via Cascaded Recurrent Network With Adaptive Correlation,” in CVPR, 2022, pp. 16 263–16 272.
  15. N. Mayer, E. Ilg, P. Häusser, P. Fischer, D. Cremers, A. Dosovitskiy, and T. Brox, “A Large Dataset to Train Convolutional Networks for Disparity, Optical Flow, and Scene Flow Estimation.” in CVPR, 2016, pp. 4040–4048.
  16. F. Tosi, A. Tonioni, D. Gregorio, and M. Poggi, “Nerf-Supervised Deep Stereo,” in CVPR, 2023.
  17. A. Geiger, P. Lenz, and R. Urtasun, “Are we ready for autonomous driving? the kitti vision benchmark suite,” in CVPR, 2012, pp. 3354–3361.
  18. T. Schops, J. L. Schonberger, S. Galliani, T. Sattler, K. Schindler, M. Pollefeys, and A. Geiger, “A multi-view stereo benchmark with high-resolution images and multi-camera videos,” in CVPR, July 2017.
  19. D. Scharstein, H. Hirschmüller, Y. Kitajima, G. Krathwohl, N. Nešić, X. Wang, and P. Westling, “High-resolution stereo datasets with subpixel-accurate ground truth,” in Pattern Recognition.   Springer International Publishing, 2014, pp. 31–42.
  20. T. Schops, J. L. Schonberger, S. Galliani, T. Sattler, K. Schindler, M. Pollefeys, and A. Geiger, “A multi-view stereo benchmark with high-resolution images and multi-camera videos,” in CVPR, 2017, pp. 3260–3269.
  21. K. A. Skinner, J. Zhang, E. A. Olson, and M. Johnson-Roberson, “Uwstereonet: Unsupervised learning for depth estimation and color correction of underwater stereo imagery,” in ICRA, 2019, pp. 7947–7954.
  22. D. Berman, D. Levy, S. Avidan, and T. Treibitz, “Underwater Single Image Color Restoration Using Haze-Lines and a New Quantitative Dataset,” IEEE TPAMI, pp. 1–1, 2020.
  23. Y. Randall and T. Treibitz, “Flsea: Underwater Visual-Inertial and Stereo-Vision Forward-Looking Datasets,” arXiv, vol. abs/2302.12772, 2023.
  24. C. Li, C. Guo, W. Ren, R. Cong, J. Hou, S. Kwong, and D. Tao, “An underwater image enhancement benchmark dataset and beyond,” IEEE TIP, vol. 29, pp. 4376–4389, 2020.
  25. D. J. Butler, J. Wulff, G. B. Stanley, and M. J. Black, “A naturalistic open source movie for optical flow evaluation,” in ECCV, Oct. 2012, pp. 611–625.
  26. P. G. O. Zwilgmeyer, M. Yip, A. L. Teigen, R. Mester, and A. Stahl, “The VAROS Synthetic Underwater Data Set: Towards realistic multi-sensor underwater data with ground truth,” in ICCVW, 2021, pp. 3715–3723.
  27. D. Akkaynak and T. Treibitz, “A Revised Underwater Image Formation Model,” in CVPR, 2018, pp. 6723–6732.
  28. X. Ye, J. Zhang, Y. Yuan, R. Xu, Z. Wang, and H. Li, “Underwater depth estimation via stereo adaptation networks,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 33, no. 9, pp. 5089–5101, 2023.
  29. D. Scharstein and R. Szeliski, “A taxonomy and evaluation of dense two-frame stereo correspondence algorithms,” International journal of computer vision, vol. 47, pp. 7–42, 2002.
  30. J. Zbontar, Y. LeCun et al., “Stereo matching by training a convolutional neural network to compare image patches.” J. Mach. Learn. Res., vol. 17, no. 1, pp. 2287–2318, 2016.
  31. A. Shaked and L. Wolf, “Improved stereo matching with constant highway networks and reflective confidence learning,” in CVPR, 2017, pp. 4641–4650.
  32. F. Guney and A. Geiger, “Displets: Resolving stereo ambiguities using object knowledge,” in CVPR, 2015, pp. 4165–4175.
  33. A. Seki and M. Pollefeys, “Sgm-nets: Semi-global matching with neural networks,” in CVPR, 2017, pp. 231–240.
  34. F. Zhang, X. Qi, R. Yang, V. Prisacariu, B. Wah, and P. Torr, “Domain-Invariant Stereo Matching Networks,” in ECCV, 2020, pp. 420–439.
  35. H. Dai, X. Zhang, Y. Zhao, H. Sun, and N. Zheng, “Adaptive disparity candidates prediction network for efficient real-time stereo matching,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 32, no. 5, pp. 3099–3110, 2022.
  36. Q. Chen, B. Ge, and J. Quan, “Unambiguous pyramid cost volumes fusion for stereo matching,” IEEE Transactions on Circuits and Systems for Video Technology, pp. 1–1, 2023.
  37. Z. Teed and J. Deng, “Raft: Recurrent All-Pairs Field Transforms for Optical Flow,” in ECCV, 2020, pp. 402–419.
  38. Z. Rao, B. Xiong, M. He, Y. Dai, R. He, Z. Shen, and X. Li, “Masked representation learning for domain generalized stereo matching,” in CVPR, 2023.
  39. P. Weinzaepfel, T. Lucas, V. Leroy, Y. Cabon, V. Arora, R. Brégier, G. Csurka, L. Antsfeld, B. Chidlovskii, and J. Revaud, “Croco v2: Improved cross-view completion pre-training for stereo matching and optical flow,” in ICCV, October 2023, pp. 17 969–17 980.
  40. B. Mildenhall, P. P. Srinivasan, M. Tancik, J. T. Barron, R. Ramamoorthi, and R. Ng, “Nerf: Representing scenes as neural radiance fields for view synthesis,” in ECCV, 2020.
  41. M. Savva, A. X. Chang, and P. Hanrahan, “Semantically-enriched 3d models for common-sense knowledge,” in CVPR, 2015, pp. 24–31.
  42. M. J. Islam, C. Edge, Y. Xiao, P. Luo, M. Mehtaz, C. Morse, S. S. Enan, and J. Sattar, “Semantic Segmentation of Underwater Imagery: Dataset and Benchmark,” in IROS.   IEEE, 2020.
  43. D. Akkaynak and T. Treibitz, “Sea-Thru: A Method for Removing Water From Underwater Images,” in CVPR, 2019.
  44. J. Li, K. A. Skinner, R. M. Eustice, and M. Johnson-Roberson, “Watergan: Unsupervised generative network to enable real-time color correction of monocular underwater images,” IEEE Robotics and Automation letters, vol. 3, no. 1, pp. 387–394, 2017.
  45. S. Zhu and X. Liu, “Pmatch: Paired Masked Image Modeling for Dense Geometric Matching,” in CVPR, 2023.
  46. K. He, X. Chen, S. Xie, Y. Li, P. Dollár, and R. Girshick, “Masked Autoencoders Are Scalable Vision Learners,” in CVPR, 2022, pp. 16 000–16 009.
  47. J. Sun, Z. Shen, Y. Wang, H. Bao, and X. Zhou, “Loftr: Detector-Free Local Feature Matching With Transformers,” in CVPR, 2021, pp. 8922–8931.
  48. X. Cheng, Y. Zhong, M. Harandi, Y. Dai, X. Chang, H. Li, T. Drummond, and Z. Ge, “Hierarchical Neural Architecture Search for Deep Stereo Matching,” in NeurIPS, 2020.
  49. F. Zhang, V. Prisacariu, R. Yang, and P. H. Torr, “Ga-Net: Guided Aggregation Net for End-To-End Stereo Matching,” in CVPR, 2019, pp. 185–194.
  50. B. Liu, H. Yu, and G. Qi, “Graftnet: Towards Domain Generalized Stereo Matching With a Broad-Spectrum and Task-Oriented Feature,” in CVPR, 2022, pp. 13 012–13 021.
  51. A. Dosovitskiy, G. Ros, F. Codevilla, A. Lopez, and V. Koltun, “CARLA: An open urban driving simulator,” in Proceedings of the 1st Annual Conference on Robot Learning, ser. Proceedings of Machine Learning Research, S. Levine, V. Vanhoucke, and K. Goldberg, Eds., vol. 78.   PMLR, 13–15 Nov 2017, pp. 1–16.
  52. J. Lou, W. Liu, Z. Chen, F. Liu, and J. Cheng, “Elfnet: Evidential local-global fusion for stereo matching,” in ICCV, October 2023, pp. 17 784–17 793.
  53. Z. Shen, Y. Dai, X. Song, Z. Rao, D. Zhou, and L. Zhang, “Pcw-net: Pyramid combination and warping cost volume for stereo matching,” in ECCV, S. Avidan, G. Brostow, M. Cissé, G. M. Farinella, and T. Hassner, Eds., 2022, pp. 280–297.
  54. D. Levy, A. Peleg, N. Pearl, D. Rosenbaum, D. Akkaynak, S. Korman, and T. Treibitz, “Seathru-NeRF: Neural Radiance Fields in Scattering Media,” in CVPR, vol. abs/2304.07743, 2023.
  55. J. Ke, Q. Wang, Y. Wang, P. Milanfar, and F. Yang, “Musiq: Multi-scale image quality transformer,” in ICCV, 2021, pp. 5128–5137.
  56. K. Panetta, C. Gao, and S. Agaian, “Human-visual-system-inspired underwater image quality measures,” IEEE Journal of Oceanic Engineering, vol. 41, no. 3, pp. 541–551, 2016.
  57. M. Yang and A. Sowmya, “An Underwater Color Image Quality Evaluation Metric,” IEEE TIP, vol. 24, no. 12, pp. 6062–6071, 2015.
  58. C. Guo, R. Wu, X. Jin, L. Han, W. Zhang, Z. Chai, and C. Li, “Underwater Ranker: Learn Which Is Better and How to Be Better,” in AAAI, 2023, pp. 702–709.

Summary

Paper to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.