Adaptive Neural Networks for Efficient Inference

Published 25 Feb 2017 in cs.LG, cs.CV, cs.NE, and stat.ML | (1702.07811v2)

Abstract: We present an approach to adaptively utilize deep neural networks in order to reduce the evaluation time on new examples without loss of accuracy. Rather than attempting to redesign or approximate existing networks, we propose two schemes that adaptively utilize networks. We first pose an adaptive network evaluation scheme, where we learn a system to adaptively choose the components of a deep network to be evaluated for each example. By allowing examples correctly classified using early layers of the system to exit, we avoid the computational time associated with full evaluation of the network. We extend this to learn a network selection system that adaptively selects the network to be evaluated for each example. We show that computational time can be dramatically reduced by exploiting the fact that many examples can be correctly classified using relatively efficient networks and that complex, computationally costly networks are only necessary for a small fraction of examples. We pose a global objective for learning an adaptive early exit or network selection policy and solve it by reducing the policy learning problem to a layer-by-layer weighted binary classification problem. Empirically, these approaches yield dramatic reductions in computational cost, with up to a 2.8x speedup on state-of-the-art networks from the ImageNet image recognition challenge with minimal (<1%) loss of top5 accuracy.

Abstract PDF Upgrade to Chat

Citations (331)

View on Semantic Scholar

Summary

The paper introduces adaptive early exit strategies and network selection systems to reduce computation time without compromising accuracy.
The methodology employs a weighted binary classification mechanism for early exits and a directed acyclic graph for network selection, achieving up to 2.8x speed-ups.
Experimental evaluations on ImageNet demonstrate that adaptive inference maintains high accuracy while significantly lowering test-time computing costs.

Adaptive Neural Networks for Efficient Inference

The paper "Adaptive Neural Networks for Efficient Inference" proposes a novel approach to improving the efficiency of deep neural networks by adaptively selecting computation paths. Two schemes are introduced: early exit strategies and network selection systems, which reduce computational time without significantly affecting accuracy. This summary details the methodology, experimental results, and implications of these schemes.

Introduction

Deep neural networks (DNNs) are known for their exceptional capabilities in various applications but suffer from high computational costs at test time. The proposed approach addresses this cost by dynamically utilizing simpler networks for "easy" examples and reserving complex networks for "difficult" ones. The research introduces adaptive neural evaluations comprising two strategies: adaptive early exits and network selection policies, aiming to minimize computational demands.

Figure 1: Performance versus evaluation complexity of the DNN architectures that won the ImageNet challenge over past several years. The model evaluation times increase exponentially with respect to the increase in accuracy.

Adaptive Early Exit Networks

The adaptive early exit network strategy allows examples to bypass parts of the network, exiting early when sufficient confidence is achieved in predictions. The decision-making process is modeled as a layer-by-layer weighted binary classification (WBC), optimizing computation versus accuracy trade-offs. Specifically, models are trained to identify when examples can confidently exit without proceeding through the entire network, thereby saving computational resources.

For practical implementation, classifiers are deployed at specific exits within networks, evaluating the necessity to proceed. This system can lead to significant speed-ups, with experimental results showing up to 30% reduction in evaluation time for networks like GoogLeNet and ResNet50.

Figure 2: The plots show the accuracy gains at different layers for early exits for networks GoogLeNet (top) and Resnet50 (bottom).

Network Selection

The network selection approach orders multiple pretrained networks in a directed acyclic graph, allowing adaptive decisions on which network to use. For each example, a decision policy selects the cheapest model able to deliver accurate results. Policies are learned bottom-up, optimizing for speed and accuracy on datasets such as ImageNet, with substantial speed-ups reported when switching dynamically across networks.

Figure 3: Performance of network selection policy on Imagenet (Left: top-5 error Right: top-1 error). Our full adaptive system (denoted with blue dots) significantly outperforms any individual network for almost all budget regions and is close to the performance of the oracle.

Experimental Evaluation

The methods were evaluated on the Imagenet 2012 classification dataset using AlexNet, GoogLeNet, and ResNet50 models. Results indicated substantial reductions in computational time, with up to 2.8x speed-ups possible with minimal loss in accuracy. The approach's advantage is most notable in network selection scenarios, where non-linear evaluation cost increases can be mitigated.

Figure 4: (Left) Different network selection topologies that we considered. Arrows denote possible jumps allowed to the policy. A, G and R denote Alexnet, GoogLeNet and Resnet50, respectively. (Right) Statistics for proportion of total time spent on different networks and proportion of samples that exit at each network. Top row is sampled at 2.0ms and bottom row is sampled at 2.8ms system evaluation.

Conclusion

The proposed adaptive inference strategies offer significant reductions in test-time computational demands for DNNs while maintaining their accuracy. The research supports deployment in resource-constrained environments, such as mobile and IoT devices, where computation costs are critical. Future developments may involve hybrid models and integration with distributed processing systems to leverage fog computing capabilities.

This investigation demonstrates the potential for adaptive systems to optimize computation resources, positing them as crucial advancements in efficient AI deployment on constrained devices.

Markdown Report Issue