Papers
Topics
Authors
Recent
Search
2000 character limit reached

Benchmarking Deep Learning Models on NVIDIA Jetson Nano for Real-Time Systems: An Empirical Investigation

Published 25 Jun 2024 in cs.AR, cs.CV, and cs.LG | (2406.17749v1)

Abstract: The proliferation of complex deep learning (DL) models has revolutionized various applications, including computer vision-based solutions, prompting their integration into real-time systems. However, the resource-intensive nature of these models poses challenges for deployment on low-computational power and low-memory devices, like embedded and edge devices. This work empirically investigates the optimization of such complex DL models to analyze their functionality on an embedded device, particularly on the NVIDIA Jetson Nano. It evaluates the effectiveness of the optimized models in terms of their inference speed for image classification and video action detection. The experimental results reveal that, on average, optimized models exhibit a 16.11% speed improvement over their non-optimized counterparts. This not only emphasizes the critical need to consider hardware constraints and environmental sustainability in model development and deployment but also underscores the pivotal role of model optimization in enabling the widespread deployment of AI-assisted technologies on resource-constrained computational systems. It also serves as proof that prioritizing hardware-specific model optimization leads to efficient and scalable solutions that substantially decrease energy consumption and carbon footprint.

Citations (1)

Summary

  • The paper shows that using TensorRT for DL model optimization on the NVIDIA Jetson Nano leads to significant inference speed-ups (up to 16.7×) in deep learning models.
  • It employs a conversion pipeline from PyTorch to TensorRT and evaluates various architectures like MobileNet-V2 and ShuffleNet-V2 for enhanced edge performance.
  • The study highlights that model architecture and computational metrics, such as FLOPS, impact optimization outcomes, offering insights for real-time AI on resource-constrained devices.

Benchmarking Deep Learning Models on NVIDIA Jetson Nano: An Empirical Investigation

Introduction

In recent years, the widespread adoption of deep learning (DL) models has revolutionized numerous applications, including computer vision. Despite their success, the deployment of these models on low-computational power and memory-constrained devices such as embedded systems and edge devices presents significant challenges. This paper examines DL model optimization for such devices, specifically the NVIDIA Jetson Nano. The study aims to quantify performance improvements in inference speed for image classification and video action detection through model optimization.

Methodology

The NVIDIA Jetson Nano, depicted below with its components and connectivity options, was employed to test the optimized DL models. Figure 1

Figure 1: Layout of the NVIDIA Jetson Nano Developer Kit showcasing its key components and connectivity options.

The optimization process involves converting PyTorch models to TensorRT formats using the pipeline depicted below. This approach significantly enhances the inference speed on the Jetson Nano.

(Figure 2)

Figure 2: The PyTorch deep learning model optimization process for a NVIDIA Jetson Nano Edge Device using TensorRT.

The experimental setup features image classification models including well-known architectures such as AlexNet, VGG, ResNet, and MobileNet-V2. Additionally, custom models for human action recognition were developed and evaluated. The models underwent optimization employing TensorRT, where various techniques like layer and tensor fusion, and kernel auto-tuning were applied to achieve low latency and high throughput for inference.

(Figure 3)

Figure 3: Inference process of TensorRT engine on NVIDIA Jetson Nano.

Results and Discussion

The optimization achieved by utilizing TensorRT resulted in a substantial reduction in inference time, as documented in the paper. For instance, models such as ShuffleNet-V2 and MobileNet-V2 exhibited speed-ups of 13.6×13.6\times and 16.7×16.7\times, respectively, post-optimization. These results highlight the effectiveness of optimization techniques used to tailor models for specific hardware capabilities without compromising on computational performance.

(Figure 4)

Figure 4: Inference time speedup of the optimized models on NVIDIA Jetson Nano compared to their non-optimized baseline counterparts.

The observed trends indicate that models with lower FLOPS demonstrate a more pronounced enhancement in inference speed post-optimization. However, variations exist, such as with ResNet-V2, which suggests that architecture-specific optimizations impact the overall efficiency beyond simple computational metrics like FLOPS.

Conclusion

This study demonstrates that DL models optimized for edge devices, such as the NVIDIA Jetson Nano, can drastically reduce inference times by an average of 7.011×7.011\times across both image classification and video action detection tasks. These optimizations are critical for enabling the deployment of advanced AI applications on resource-limited systems, addressing issues of latency and sustainability, and reducing the carbon footprint.

Future work should investigate additional optimization techniques, such as quantization-aware training and network pruning, to further enhance model efficiency. As AI continues to integrate into edge computing environments, the significance of optimizing DL models will only grow, ensuring scalable, high-performance solutions in various domains. For more detailed insights and code implementations, refer to the project's repository.

Paper to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Collections

Sign up for free to add this paper to one or more collections.