Performance Analysis of Various EfficientNet Based U-Net++ Architecture for Automatic Building Extraction from High Resolution Satellite Images

Published 5 Sep 2023 in cs.CV, cs.LG, and eess.IV | (2310.06847v1)

Abstract: Building extraction is an essential component of study in the science of remote sensing, and applications for building extraction heavily rely on semantic segmentation of high-resolution remote sensing imagery. Semantic information extraction gap constraints in the present deep learning based approaches, however can result in inadequate segmentation outcomes. To address this issue and extract buildings with high accuracy, various efficientNet backbone based U-Net++ has been proposed in this study. The designed network, based on U-Net, can improve the sensitivity of the model by deep supervision, voluminous redesigned skip-connections and hence reducing the influence of irrelevant feature areas in the background. Various effecientNet backbone based encoders have been employed when training the network to enhance the capacity of the model to extract more relevant feature. According on the experimental findings, the suggested model significantly outperforms previous cutting-edge approaches. Among the 5 efficientNet variation Unet++ based on efficientb4 achieved the best result by scoring mean accuracy of 92.23%, mean iou of 88.32%, and mean precision of 93.2% on publicly available Massachusetts building dataset and thus showing the promises of the model for automatic building extraction from high resolution satellite images.

Abstract PDF Upgrade to Chat

Summary

The paper introduces EfficientNet backbones into U-Net++ to significantly improve building extraction accuracy.
The study employs the Massachusetts Buildings Dataset, achieving superior metrics with EfficientNetb4 (92.23% accuracy, 88.32% IoU).
Results underscore improved segmentation performance for urban planning, though challenges like shadows remain for future research.

Performance Analysis of Various EfficientNet Based U-Net++ Architecture for Automatic Building Extraction from High Resolution Satellite Images

Introduction

The advent of high-resolution satellite imagery has significantly enhanced our ability to perform automatic building extraction, a crucial task in remote sensing for applications such as urban planning and population density estimation. However, challenges such as noise, occlusion, and complex backgrounds remain. U-Net++, an extension of the traditional U-Net, seeks to address these issues by introducing densely nested skip-connections and deep supervision, enhancing feature extraction accuracy. In this study, EfficientNet backbones were incorporated into U-Net++ to improve building extraction efficacy and efficiency from satellite images.

Dataset

The Massachusetts Buildings Dataset is employed, comprising 151 aerial images of 1500x1500 pixels each, divided into training, test, and validation sets. Preprocessing involves down-sampling images to 256x256 and normalizing pixel values between -1 and 1.

Figure 1: Example of images from the dataset: (a--c)~Input image and (d--f)~Target image.

Methodology

U-Net++

U-Net++ enhances the traditional U-Net by using redesigned skip connections to bridge the semantic gap between encoder and decoder stages, fostering better feature map integration. Its key features include densely connected encoder-decoder paths and deep supervision, enabling superior accuracy in semantic segmentation tasks.

Figure 2: U-Net++ Vs Unet Architecture.

EfficientNet

EfficientNet utilizes a compound scaling method which jointly scales depth, width, and resolution to increase accuracy and efficiency over previous CNN models like ResNet. This makes it particularly suitable as a backbone for U-Net++ in this study.

Figure 3: EfficientNet(Model Architecture).

Proposed Architecture

Incorporating EfficientNet as the backbone in U-Net++ aims to balance model complexity while enhancing accuracy. The extensive connections and scales of EfficientNet improve feature extraction and robustness in segmentation tasks. Each EfficientNet-based U-Net++ variant was evaluated, analyzing their performances in real-world scenarios.

Figure 4: Research and Evaluation Workflow.

Result Analysis

Performance evaluation metrics included accuracy, precision, recall, and IoU. No overfitting was observed during training, and results showcased how EfficientNetb4-based U-Net++ achieved superior metrics:

Mean Accuracy: 92.23%
Mean IoU: 88.32%
Mean Precision: 93.2%

Comparisons with other architectures underscore the proposed method's effectiveness in building extraction tasks.

Figure 5: Improvement of U-Net++(efficientNetb4) over other methods (a)~IOU (b)~Accuracy (c)~Precision and (d)~Recall.

Figure 6: Training History of IOU score.

Figure 7: Example of Predicted Image Vs Original Ground-Truth of U-Net++(efficientNetb4).

Conclusion

The study successfully demonstrates the integration of EfficientNet into U-Net++, yielding enhanced performance in automatic building extraction from high-resolution satellite images. Despite promising results, challenges such as distinguishing between shadows and buildings persist, suggesting future directions to incorporate larger datasets and attention mechanisms for even greater accuracy and reliability.