Differentiable Search for Finding Optimal Quantization Strategy
Abstract: To accelerate and compress deep neural networks (DNNs), many network quantization algorithms have been proposed. Although the quantization strategy of any algorithm from the state-of-the-arts may outperform others in some network architectures, it is hard to prove the strategy is always better than others, and even cannot judge that the strategy is always the best choice for all layers in a network. In other words, existing quantization algorithms are suboptimal as they ignore the different characteristics of different layers and quantize all layers by a uniform quantization strategy. To solve the issue, in this paper, we propose a differentiable quantization strategy search (DQSS) to assign optimal quantization strategy for individual layer by taking advantages of the benefits of different quantization algorithms. Specifically, we formulate DQSS as a differentiable neural architecture search problem and adopt an efficient convolution to efficiently explore the mixed quantization strategies from a global perspective by gradient-based optimization. We conduct DQSS for post-training quantization to enable their performance to be comparable with that in full precision models. We also employ DQSS in quantization-aware training for further validating the effectiveness of DQSS. To circumvent the expensive optimization cost when employing DQSS in quantization-aware training, we update the hyper-parameters and the network parameters in a single forward-backward pass. Besides, we adjust the optimization process to avoid the potential under-fitting problem. Comprehensive experiments on high level computer vision task, i.e., image classification, and low level computer vision task, i.e., image super-resolution, with various network architectures show that DQSS could outperform the state-of-the-arts.
- Imagenet classification with deep convolutional neural networks. Advances in neural information processing systems, 25:1097–1105, 2012.
- Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770–778, 2016.
- Advances in natural language processing. Science, 349(6245):261–266, 2015.
- Language models are unsupervised multitask learners. OpenAI blog, 1(8):9, 2019.
- Speech recognition with deep recurrent neural networks. In 2013 IEEE international conference on acoustics, speech and signal processing, pages 6645–6649. Ieee, 2013.
- Convolutional neural networks for speech recognition. IEEE/ACM Transactions on audio, speech, and language processing, 22(10):1533–1545, 2014.
- Filter level pruning based on similar feature extraction for convolutional neural networks. IEICE TRANSACTIONS on Information and Systems, 101(4):1203–1206, 2018.
- Towards optimal structured cnn pruning via generative adversarial learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 2790–2799, 2019.
- Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531, 2015.
- A novel multi-knowledge distillation approach. IEICE TRANSACTIONS on Information and Systems, 104(1):216–219, 2021.
- Speeding up convolutional neural networks with low rank expansions. arXiv preprint arXiv:1405.3866, 2014.
- Convolutional neural networks with low-rank regularization. arXiv preprint arXiv:1511.06067, 2015.
- Efficient neural architecture search via parameters sharing. In International Conference on Machine Learning, pages 4095–4104. PMLR, 2018.
- Nas-bench-101: Towards reproducible neural architecture search. In International Conference on Machine Learning, pages 7105–7114. PMLR, 2019.
- Distribution-aware adaptive multi-bit quantization. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 9281–9290, 2021.
- Kohei Yamamoto. Learnable companding quantization for accurate low-bit neural networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 5029–5038, 2021.
- Opq: Compressing deep neural networks with one-shot pruning-quantization. In Proceedings of the Thirty-Fifth AAAI Conference on Artificial Intelligence (AAAI-21), Vancouver, VN, Canada, pages 2–9, 2021.
- Diversifying sample generation for accurate data-free quantization. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 15658–15667, 2021.
- Stochastic precision ensemble: self-knowledge distillation for quantized deep neural networks. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 35, pages 6794–6802, 2021.
- Haq: Hardware-aware automated quantization with mixed precision. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 8612–8620, 2019.
- Darts: Differentiable architecture search. arXiv preprint arXiv:1806.09055, 2018.
- Raghuraman Krishnamoorthi. Quantizing deep convolutional networks for efficient inference: A whitepaper. arXiv preprint arXiv:1806.08342, 2018.
- Han Vanholder. Efficient inference with tensorrt, 2016.
- Easyquant: Post-training quantization via scale optimization. arXiv preprint arXiv:2006.16669, 2020.
- Data-free quantization through weight equalization and bias correction. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 1325–1334, 2019.
- Up or down? adaptive rounding for post-training quantization. In International Conference on Machine Learning, pages 7197–7206. PMLR, 2020.
- Extremely low bit neural network: Squeeze the last bit out with admm. In Thirty-Second AAAI Conference on Artificial Intelligence, 2018.
- Fighting quantization bias with bias. arXiv preprint arXiv:1906.03193, 2019.
- Estimating or propagating gradients through stochastic neurons for conditional computation. arXiv preprint arXiv:1308.3432, 2013.
- Dorefa-net: Training low bitwidth convolutional neural networks with low bitwidth gradients. arXiv preprint arXiv:1606.06160, 2016.
- Differentiable soft quantization: Bridging full-precision and low-bit neural networks. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 4852–4861, 2019.
- Learned step size quantization. arXiv preprint arXiv:1902.08153, 2019.
- Pact: Parameterized clipping activation for quantized neural networks. arXiv preprint arXiv:1805.06085, 2018.
- Neural architecture search with reinforcement learning. arXiv preprint arXiv:1611.01578, 2016.
- Understanding and simplifying one-shot architecture search. In International Conference on Machine Learning, pages 550–559. PMLR, 2018.
- Search what you want: Barrier panelty nas for mixed precision quantization. In European Conference on Computer Vision, pages 1–16. Springer, 2020.
- Efficient bitwidth search for practical mixed precision neural network. arXiv preprint arXiv:2003.07577, 2020.
- Rethinking differentiable search for mixed-precision neural networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 2349–2358, 2020.
- Differentiable joint pruning and quantization for hardware efficiency. In European Conference on Computer Vision, pages 259–277. Springer, 2020.
- Mqbench: Towards reproducible and deployable model quantization benchmark. 2021.
- G Anandalingam and Terry L Friesz. Hierarchical optimization: An introduction. Annals of Operations Research, 34(1):1–11, 1992.
- Pytorch: An imperative style, high-performance deep learning library. Advances in neural information processing systems, 32:8026–8037, 2019.
- Imagenet: A large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition, pages 248–255. Ieee, 2009.
- Ntire 2017 challenge on single image super-resolution: Methods and results. In CVPR 2017 Wrokshops.
- Low-complexity single-image super-resolution based on nonnegative neighbor embedding. In BMVC, 2012.
- Photo-realistic single image super-resolution using a generative adversarial network. In CVPR, 2017.
- A database of human segmented natural images and its application to evaluating segmentation algorithms and measuring ecological statistics. In ICCV, 2001.
- Single image super-resolution from transformed self-exemplars. In CVPR, 2015.
- Building a manga dataset “manga109” with annotations for multimedia applications. IEEE MultiMedia, 27(2):8–18, 2020.
- Esrgan: Enhanced super-resolution generative adversarial networks. In Proceedings of the European conference on computer vision (ECCV) workshops, pages 0–0, 2018.
- Enhanced deep residual networks for single image super-resolution. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, July 2017.
- BasicSR: Open source image and video restoration toolbox. https://github.com/xinntao/BasicSR, 2020.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.