- The paper presents a hybrid framework that combines YOLOv7 detection and ResNet18 regression to infer building heights from shadow lengths and solar angles.
- The methodology integrates deep learning with classical photogrammetry, significantly reducing RMSE and outperforming traditional techniques.
- A novel dataset from 42 Chinese cities with detailed annotations was curated, enhancing model reliability and evaluation.
Building Height Estimation Using Shadow Length in Satellite Imagery
The estimation of building heights utilizing shadow lengths as captured in satellite imagery presents an intriguing advancement in the field of remote sensing. This paper elaborates on a framework that leverages this approach through a combination of detection, regression, and photogrammetry, surpassing the capabilities of contemporary methods significantly in terms of accuracy and applicability.
Framework Overview
The methodology employs monocular satellite imagery, addressing the inherent 3D spatial information loss by utilizing shadow lengths as compensatory cues. The process begins with the detection of buildings and their shadows using a modified YOLOv7 object detection model, capable of accurately localizing these structures within satellite images. Subsequently, ResNet18 is employed to regress the shadow lengths, which, along with solar elevation angles, are used to infer building heights using photogrammetric principles.
Figure 1: Overview of the proposed framework for building height estimation using shadow length.
The proposed method integrates both deep learning and traditional mathematical photogrammetric models, yielding a system that not only detects and estimates shadow lengths but also utilizes these lengths in conjunction with solar angles to determine building heights. The comprehensive integration of these elements facilitates the attainment of superior performance metrics compared to existing frameworks.
Dataset and Annotation
A new dataset was crafted, extending an existing dataset focused on 42 Chinese cities, incorporating detailed annotations such as building heights, shadow lengths, and bounding boxes for enhanced accuracy and usability. A custom annotation tool was developed to facilitate the accurate marking of shadow lengths while accounting for geographical metadata crucial for the precise calculation of solar elevation angles.

Figure 2: (a) Box plots of Root Mean Square Error on the dataset, plotted across values of ground truth height. (b) Bar plot representing the average (mean) Root Mean Square Error plotted against values of ground truth height. We can observe that the range of values that RMSE takes is small for buildings lying in the 12-30m range. The range of RMSE for buildings in the height range of 3-9m is pretty large which suggests noise. Moreover, buildings with a height >30m show very large RMSE.
A comprehensive analysis of this dataset revealed notable noise and imbalances in label distribution, particularly within short structures, prompting the implementation of specific filtering and adjustments to improve model reliability.
Methodology
The framework's core consists of three primary stages: localization, shadow estimation, and height determination.
- Localization is achieved via modified YOLOv7, which delineates bounding boxes around desired structures.
- Shadow Estimation employs these localized images, where shadows are extracted and used within regression models to predict shadow lengths accurately.
- Height Determination utilizes these shadow lengths in a well-defined photogrammetric equation, incorporating solar elevation angles for calculating building heights:
H=Sltan(σ)
where H is the building height, Sl is the shadow length, and σ is the solar elevation angle [REDA2004].
The methodology emphasizes a tight integration between empirical deep learning models and mathematical frameworks, thereby enhancing both computational efficiency and the interpretability of results.
Results and Evaluation
The framework's efficacy was assessed against notable baseline models, including MM3Net, achieving a significant reduction in root mean square error (RMSE) of building height estimates. The robust performance highlights the capability of monocular imagery combined with analytical techniques to rival, and in some cases exceed, more complex multi-spectral image-based approaches.

Figure 3: (a) YOLOv7 Predictions (b) Bounding box ground truth.
Conclusions
This study presents a comprehensive methodology utilizing shadow length to estimate building heights from satellite imagery, integrating powerful detection and regression models with classical photogrammetry. This results in a hybrid framework with high precision and applicability across diverse urban environments. Future work may expand into alternative photogrammetric methods and broaden evaluation datasets to consolidate and validate these findings further. The integration of this technique into urban planning and management tools could provide substantial benefits, offering low-cost, high-scale solutions for monitoring urban sprawl and infrastructure development.