Papers
Topics
Authors
Recent
Search
2000 character limit reached

3D Building Reconstruction from Monocular Remote Sensing Images with Multi-level Supervisions

Published 7 Apr 2024 in cs.CV | (2404.04823v1)

Abstract: 3D building reconstruction from monocular remote sensing images is an important and challenging research problem that has received increasing attention in recent years, owing to its low cost of data acquisition and availability for large-scale applications. However, existing methods rely on expensive 3D-annotated samples for fully-supervised training, restricting their application to large-scale cross-city scenarios. In this work, we propose MLS-BRN, a multi-level supervised building reconstruction network that can flexibly utilize training samples with different annotation levels to achieve better reconstruction results in an end-to-end manner. To alleviate the demand on full 3D supervision, we design two new modules, Pseudo Building Bbox Calculator and Roof-Offset guided Footprint Extractor, as well as new tasks and training strategies for different types of samples. Experimental results on several public and new datasets demonstrate that our proposed MLS-BRN achieves competitive performance using much fewer 3D-annotated samples, and significantly improves the footprint extraction and 3D reconstruction performance compared with current state-of-the-art. The code and datasets of this work will be released at https://github.com/opendatalab/MLS-BRN.git.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (47)
  1. Hybrid task cascade for instance segmentation. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 4974–4983, 2019a.
  2. Structure-aware residual pyramid network for monocular depth estimation. In Proceedings of the 28th International Joint Conference on Artificial Intelligence, pages 694–700, 2019b.
  3. So-handnet: Self-organizing network for 3d hand pose estimation with semi-supervised learning. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 6961–6970, 2019c.
  4. Learning geocentric object pose in oblique monocular images. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 14512–14520, 2020.
  5. Single view geocentric pose in the wild. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 1162–1171, 2021.
  6. Deepglobe 2018: A challenge to parse the earth through satellite images. In 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), 2018.
  7. Algorithms for the reduction of the number of points required to represent a digitized line or its caricature. Cartographica: the international journal for geographic information and geovisualization, 10(2):112–122, 1973.
  8. Towards large-scale city reconstruction from satellites. In European Conference on Computer Vision (ECCV), 2016.
  9. Deep ordinal regression network for monocular depth estimation. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 2002–2011, 2018.
  10. Joint learning of semantic segmentation and height estimation for remote sensing image leveraging contrastive learning. IEEE Transactions on Geoscience and Remote Sensing, 2023.
  11. Img2dsm: Height simulation from single imagery using conditional generative adversarial net. IEEE Geoence & Remote Sensing Letters, pages 1–5, 2018.
  12. Weakly supervised 3d reconstruction with adversarial constraint. In 2017 International Conference on 3D Vision (3DV), pages 263–272. IEEE, 2017.
  13. Weakly-supervised learning of category-specific 3d object shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence, 43(4):1423–1437, 2021.
  14. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770–778, 2016.
  15. Mask r-cnn. In Proceedings of the IEEE international conference on computer vision (CVPR), pages 2961–2969, 2017.
  16. ISPRS. ISPRS 3D Semantic Labeling Contest. https://www.isprs.org/education/benchmarks/UrbanSemLab/3d-semantic-labeling.aspx, 2022.
  17. Semi-supervised adversarial monocular depth estimation. IEEE transactions on pattern analysis and machine intelligence, 42(10):2410–2422, 2019a.
  18. Building instance change detection from large-scale aerial images using convolutional neural networks and simulated samples. Remote Sensing, 11(11):1343, 2019b.
  19. Saket Kunwar. U-net ensemble for semantic and height estimation using coarse-map initialization. In IGARSS 2019-2019 IEEE International Geoscience and Remote Sensing Symposium, pages 4959–4962. IEEE, 2019.
  20. Robust model-based face reconstruction through weakly-supervised outlier segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 372–381, 2023a.
  21. Approximating shapes in images with low-complexity polygons. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2020.
  22. 3dcentripetalnet: Building height retrieval from monocular remote sensing imagery. International Journal of Applied Earth Observation and Geoinformation, 120:103311, 2023b.
  23. 3d building reconstruction from monocular remote sensing images. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 12548–12557, 2021a.
  24. Joint semantic–geometric learning for polygonal building segmentation. In AAAI, 2021b.
  25. Omnicity: Omnipotent city understanding with multi-level and multi-view images. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 17397–17407, 2023c.
  26. Topological map extraction from overhead images. In Proceedings of the IEEE International Conference on Computer Vision (CVPR), pages 1715–1724, 2019.
  27. Feature pyramid networks for object detection. In Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pages 2117–2125, 2017.
  28. Boundary-aware 3d building reconstruction from a single overhead image. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2020.
  29. Elevation estimation-driven building 3d reconstruction from single-view remote sensing imagery. IEEE Transactions on Geoscience and Remote Sensing, 2023.
  30. Microsoft. Microsoft Global Building Footprints. https://github.com/microsoft/GlobalMLBuildingFootprints, 2023.
  31. Multiview-consistent semi-supervised learning for 3d human pose estimation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 6907–6916, 2020.
  32. Sharada Prasanna Mohanty. Crowdai dataset: the mapping challenge. https://www.aicrowd.com/challenges/. 2018.
  33. Hand pose estimation through semi-supervised and weakly-supervised learning. Computer Vision and Image Understanding, 164:56–67, 2017.
  34. U-net: Convolutional networks for biomedical image segmentation. In International Conference on Medical image computing and computer-assisted intervention, pages 234–241. Springer, 2015.
  35. Continental-scale building detection from high resolution satellite imagery. arXiv preprint arXiv:2107.12283, 2021.
  36. Joint height estimation and semantic labeling of monocular aerial images with cnns. In Igarss IEEE International Geoscience & Remote Sensing Symposium, 2017.
  37. High-resolution representations for labeling pixels and regions. arXiv preprint arXiv:1904.04514, 2019.
  38. 3d building detection and modeling from aerial lidar data. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2006.
  39. Learning to extract building footprints from off-nadir aerial images. IEEE Transactions on Pattern Analysis and Machine Intelligence, 45(1):1294–1301, 2022.
  40. Spacenet mvoi: a multi-view overhead imagery dataset. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 992–1001, 2019.
  41. The benchmark: Transferable representation learning for monocular height estimation. IEEE Transactions on Geoscience and Remote Sensing, 2023.
  42. Learning single-view 3d reconstruction with limited pose supervision. In Proceedings of the European Conference on Computer Vision (ECCV), pages 86–101, 2018.
  43. Jiangye Yuan. Learning building extraction in aerial scenes with convolutional networks. IEEE transactions on pattern analysis and machine intelligence, 40(11):2793–2798, 2017.
  44. Building outline delineation: From aerial images to polygons with an improved end-to-end learning framework. ISPRS journal of photogrammetry and remote sensing, 175:119–131, 2021.
  45. Pop-net: Encoder-dual decoder for semantic segmentation and single-view height estimation. In IGARSS 2019-2019 IEEE International Geoscience and Remote Sensing Symposium, pages 4963–4966. IEEE, 2019.
  46. Machine-learned regularization and polygonization of building segmentation masks. In 2020 25th International Conference on Pattern Recognition (ICPR), pages 3098–3105. IEEE, 2021.
  47. Polyworld: Polygonal building extraction with graph neural networks in satellite images. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 1848–1857, 2022.
Citations (4)

Summary

No one has generated a summary of this paper yet.

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.