SqueezeDet: Unified, Small, Low Power Fully Convolutional Neural Networks for Real-Time Object Detection for Autonomous Driving

Published 4 Dec 2016 in cs.CV | (1612.01051v4)

Abstract: Object detection is a crucial task for autonomous driving. In addition to requiring high accuracy to ensure safety, object detection for autonomous driving also requires real-time inference speed to guarantee prompt vehicle control, as well as small model size and energy efficiency to enable embedded system deployment. In this work, we propose SqueezeDet, a fully convolutional neural network for object detection that aims to simultaneously satisfy all of the above constraints. In our network, we use convolutional layers not only to extract feature maps but also as the output layer to compute bounding boxes and class probabilities. The detection pipeline of our model only contains a single forward pass of a neural network, thus it is extremely fast. Our model is fully-convolutional, which leads to a small model size and better energy efficiency. While achieving the same accuracy as previous baselines, our model is 30.4x smaller, 19.7x faster, and consumes 35.2x lower energy. The code is open-sourced at \url{https://github.com/BichenWuUCB/squeezeDet}.

Abstract PDF Upgrade to Chat

Citations (499)

View on Semantic Scholar

Summary

The paper introduces a fully convolutional network that integrates region proposal and classification in a single forward pass for rapid object detection.
The paper demonstrates that SqueezeDet is 30.4 times smaller, 19.7 times faster, and consumes 35.2 times less energy than previous models while maintaining competitive accuracy.
The paper highlights the potential of its design for resource-constrained applications beyond autonomous driving, including UAVs, robotics, and edge devices.

Unified Fully Convolutional Neural Networks for Object Detection in Autonomous Driving

This paper introduces SqueezeDet, a compact and efficient fully convolutional neural network designed for real-time object detection in the context of autonomous driving. Motivated by the need for high accuracy, swift inference speed, and minimal energy consumption, SqueezeDet seeks to address these constraints simultaneously.

Methodology

The core of SqueezeDet's architecture lies in its fully convolutional design, negating the need for fully connected layers that have traditionally increased model size and computational expense. Instead, convolutional layers are utilized to extract feature maps and directly compute bounding boxes alongside class probabilities. Inspired by YOLO, this approach incorporates a single forward pass for object detection, which streamlines processing and enhances speed.

The innovative layer, ConvDet, is pivotal to the SqueezeDet architecture. ConvDet generates numerous region proposals using a convolutional operation, significantly reducing parameter count compared to YOLO's fully connected equivalents. By integrating region proposal and classification, SqueezeDet achieves object detection with minimal resource utilization.

Strong Numerical Results

SqueezeDet demonstrates remarkable efficiency. Key results show that the model is 30.4 times smaller, 19.7 times faster, and consumes 35.2 times less energy than previous baselines, achieving competitive accuracy on the KITTI object detection challenge. SqueezeDet achieves a mean average precision (mAP) of 76.7% and operates at 57.2 frames per second (FPS), exceeding standard real-time speed benchmarks. Furthermore, an enhanced variant, SqueezeDet+, scores 80.4% mAP, closely rivaling larger architectures like Faster R-CNN.

Implications and Future Directions

The implications of this research extend beyond autonomous vehicles, potentially transforming any domain requiring embedded and mobile vision applications where resource constraints are critical. The use of SqueezeDet could facilitate advancements in UAVs, robotics, and edge devices, where energy efficiency and low latency are paramount.

Theoretically, SqueezeDet challenges the status quo by demonstrating that smaller, faster models can maintain competitive accuracy without relying on larger parameter sets. Future developments could explore adaptive architectures that dynamically adjust their complexity based on the available computational resources or specific task requirements.

Conclusion

SqueezeDet presents a significant advancement in deploying CNNs for real-time applications in resource-constrained environments. By achieving a delicate balance between size, speed, and accuracy, it signifies a step forward in efficient neural network design. Future research may expand on this framework, aiming to seamlessly integrate higher-level reasoning or multimodal sensor fusion to enhance the perceptual capabilities of autonomous systems.

Markdown Report Issue