SqueezeSegV3: Spatially-Adaptive Convolution for Efficient Point-Cloud Segmentation

Published 3 Apr 2020 in cs.CV | (2004.01803v2)

Abstract: LiDAR point-cloud segmentation is an important problem for many applications. For large-scale point cloud segmentation, the \textit{de facto} method is to project a 3D point cloud to get a 2D LiDAR image and use convolutions to process it. Despite the similarity between regular RGB and LiDAR images, we discover that the feature distribution of LiDAR images changes drastically at different image locations. Using standard convolutions to process such LiDAR images is problematic, as convolution filters pick up local features that are only active in specific regions in the image. As a result, the capacity of the network is under-utilized and the segmentation performance decreases. To fix this, we propose Spatially-Adaptive Convolution (SAC) to adopt different filters for different locations according to the input image. SAC can be computed efficiently since it can be implemented as a series of element-wise multiplications, im2col, and standard convolution. It is a general framework such that several previous methods can be seen as special cases of SAC. Using SAC, we build SqueezeSegV3 for LiDAR point-cloud segmentation and outperform all previous published methods by at least 3.7% mIoU on the SemanticKITTI benchmark with comparable inference speed.

Abstract PDF Upgrade to Chat

Citations (326)

View on Semantic Scholar

Summary

The paper introduces Spatially-Adaptive Convolution (SAC) to dynamically adjust filters for LiDAR segmentation, achieving a 3.7% mIoU improvement on SemanticKITTI.
It integrates SAC into an enhanced RangeNet architecture, balancing computational efficiency with high accuracy in real-time autonomous driving scenarios.
Empirical evaluations reveal that SAC variants, especially SAC-ISK, deliver competitive speed and improved small object detection performance.

Spatially-Adaptive Convolution in LiDAR Point-Cloud Segmentation: An Overview of SqueezeSegV3

The paper presents a method titled "SqueezeSegV3: Spatially-Adaptive Convolution for Efficient Point-Cloud Segmentation," addressing the challenge of accurate and efficient segmentation of LiDAR point-clouds through a novel approach known as Spatially-Adaptive Convolution (SAC). This method is situated in the domain of autonomous driving, which extensively uses LiDAR sensors to generate point-cloud representations for interpreting vehicular surroundings.

Background and Motivation

LiDAR point-cloud segmentation is fundamental to historical and modern applications such as autonomous vehicle operation and environmental mapping. Traditional approaches often employ a projection of 3D point-clouds into 2D LiDAR images to leverage the computational efficiency of convolutions. Yet, the paper identifies a core limitation of standard convolutions—the inability to effectively handle feature distribution shifts across LiDAR 2D images due to spatial variability. Hence, they underscore the importance of developing a convolutional technique that adapts spatially to these variations, ensuring efficient utilization of the model's capacity.

Proposed Method: Spatially-Adaptive Convolution

The innovative element in the paper, SAC, is devised to address the shortcomings of conventional convolutions by adopting varying filters for distinct image sections based on input data. This spatial adaptiveness enables the model to remain content-aware and efficiently handles diverse feature distributions without excessively increasing computational demands. SAC is operationalized by factorizing the adaptive filter using a static convolution weight and an attention map, computed using a lightweight convolutional operation on inputs. The paper thoroughly explores multiple SAC variants, such as SAC-S, SAC-IS, SAC-SK, and SAC-ISK, with SAC-ISK yielding the most favorable balance between accuracy and computational overhead.

The Architecture of SqueezeSegV3

Building on SAC, the authors introduce the SqueezeSegV3 architecture which utilizes SAC for LiDAR point-cloud segmentation. SqueezeSegV3 modifies the RangeNet architecture by incorporating SAC in its layers. A noteworthy aspect of the model is its balanced design, maintaining computational efficiency without sacrificing accuracy, evident in its outstanding 3.7% mIoU improvement over previous benchmarks on the SemanticKITTI dataset, a widely recognized standard in the field.

Implementation and Evaluation

The authors employ an end-to-end training methodology with a carefully scheduled learning rate to yield SqueezeSegV3-21 and SqueezeSegV3-53 models. Their empirical evaluation on the SemanticKITTI dataset showcases SqueezeSegV3's superior mIoU scores and its ability to manage small object detection, a typical challenge in existing systems. The comprehensive comparison against other methods like PointNet and RangeNet underscores SqueezeSegV3's competitive speed and accuracy, promising practical applicability in real-time scenarios.

Conclusion and Future Directions

SqueezeSegV3 marks a significant stride forward in LiDAR point-cloud segmentation by introducing a framework that dynamically adapts to feature variations and responds to the spatial needs of convolutional operations. This development has palpable implications in autonomous navigation, offering more precise environmental interpretations which are crucial for operational safety and efficiency. As the research field evolves, potential avenues for further advancements could include optimizing SAC's computational cost even further, and exploring its integration with other modalities for comprehensive sensor fusion, enhancing environmental awareness and decision-making for autonomous systems.

Markdown Report Issue