Discrimination-aware Channel Pruning for Deep Neural Networks

Published 28 Oct 2018 in cs.CV | (1810.11809v3)

Abstract: Channel pruning is one of the predominant approaches for deep model compression. Existing pruning methods either train from scratch with sparsity constraints on channels, or minimize the reconstruction error between the pre-trained feature maps and the compressed ones. Both strategies suffer from some limitations: the former kind is computationally expensive and difficult to converge, whilst the latter kind optimizes the reconstruction error but ignores the discriminative power of channels. To overcome these drawbacks, we investigate a simple-yet-effective method, called discrimination-aware channel pruning, to choose those channels that really contribute to discriminative power. To this end, we introduce additional losses into the network to increase the discriminative power of intermediate layers and then select the most discriminative channels for each layer by considering the additional loss and the reconstruction error. Last, we propose a greedy algorithm to conduct channel selection and parameter optimization in an iterative way. Extensive experiments demonstrate the effectiveness of our method. For example, on ILSVRC-12, our pruned ResNet-50 with 30% reduction of channels even outperforms the original model by 0.39% in top-1 accuracy.

Abstract PDF Upgrade to Chat

Citations (576)

View on Semantic Scholar

Summary

The paper presents a novel method that integrates discrimination-aware losses into intermediate layers to retain channels critical for network accuracy.
It employs a greedy channel selection strategy using ℓ2,0-norm constrained optimization to effectively prune redundant channels.
Experimental results on CIFAR-10 and ILSVRC demonstrate significant model compression, with up to a 15.58× reduction in size while maintaining or improving accuracy.

Discrimination-Aware Channel Pruning for Deep Neural Networks

The paper introduces a method called Discrimination-aware Channel Pruning (DCP) aimed at enhancing deep neural network compression techniques. Unlike existing approaches that either impose sparsity constraints from scratch or minimize reconstruction errors between original and compressed models, DCP prioritizes identifying channels that contribute significantly to a network’s discriminative power.

Overview

Channel pruning is well-regarded for reducing model size and computational cost by removing redundant channels. However, conventional methods often face challenges. Training-from-scratch is computationally heavy and prone to convergence issues, while reconstruction-based strategies may retain non-informative channels due to their focus on minimizing feature map errors.

DCP proposes a different tactic by integrating discrimination-aware losses into intermediate layers of the network. This integration is designed to fine-tune the network and focus on maintaining its discriminative abilities. By feeding additional losses and combining them with reconstruction errors, the method balances network compression with preserving accuracy.

Methodology

Discrimination-Aware Loss: DCP inserts supplementary loss functions at various layers to increase the discriminative strength of intermediate outputs. This is achieved through a combination of softmax cross-entropy losses and mean squared errors.
Greedy Channel Selection: An $\ell_{2,0}$ -norm constrained optimization identifies informative channels. A greedy algorithm iteratively selects channels that most improve the discrimination-aware objective.
Stage-Wise Process: The network is updated iteratively, fine-tuning with respect to nascent loss functions and fine-tuning the structural parameters accordingly.

Experimental Results and Reliability

Extensive experiments on datasets like CIFAR-10 and ILSVRC-12 demonstrate DCP's efficacy. Notable results include a pruned ResNet-50 model that, despite a 30% reduction in channels, improves top-1 accuracy by 0.39%. On CIFAR-10, VGGNet achieved a 15.58× reduction in model size while outperforming its original version.

Implications and Future Work

DCP presents an efficient channel pruning solution that effectively trims models without compromising accuracy. It is adaptive, supporting the automatic determination of pruning rates by observed stopping conditions, offering a substantial reduction in model size while maintaining performance standards.

The research points towards future exploration in combining DCP with other compression techniques, such as quantization, to achieve further reductions in model size and computational overhead.

Conclusion

DCP effectively highlights the importance of discriminative power in neural network compression. Its balanced approach to channel selection preserves the integrity and performance of deep neural networks, facilitating deployment on resource-constrained devices like smartphones. The introduction of discrimination-aware losses provides pathways for more fine-grained control over model compression and hints towards promising developments in efficient neural network design.