Pruning vs Quantization: Which is Better?
Abstract: Neural network pruning and quantization techniques are almost as old as neural networks themselves. However, to date only ad-hoc comparisons between the two have been published. In this paper, we set out to answer the question on which is better: neural network quantization or pruning? By answering this question, we hope to inform design decisions made on neural network hardware going forward. We provide an extensive comparison between the two techniques for compressing deep neural networks. First, we give an analytical comparison of expected quantization and pruning error for general data distributions. Then, we provide lower bounds for the per-layer pruning and quantization error in trained networks, and compare these to empirical error after optimization. Finally, we provide an extensive experimental comparison for training 8 large-scale models on 3 tasks. Our results show that in most cases quantization outperforms pruning. Only in some scenarios with very high compression ratio, pruning might be beneficial from an accuracy standpoint.
- Post training 4-bit quantization of convolutional networks for rapid-deployment. In Advances in Neural Information Processing Systems, 2019.
- Lsq+: Improving low-bit quantization through learnable offsets and better initialization. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, 2020.
- What is the state of neural network pruning? Proceedings of machine learning and systems, 2:129–146, 2020.
- Ellipsoid bounds for convex quadratic integer programming. SIAM Journal on Optimization, 25(2):741–769, 2015.
- Zeroq: A novel zero shot quantization framework. arXiv preprint arXiv:2001.00281, 2020.
- Rethinking atrous convolution for semantic image segmentation, 2017.
- PACT: parameterized clipping activation for quantized neural networks. arXiv preprint arxiv:805.06085, 2018.
- Low-bit quantization of neural networks for efficient inference. International Conference on Computer Vision (ICCV), 2019.
- Hawq: Hessian aware quantization of neural networks with mixed-precision. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 293–302, 2019.
- An image is worth 16x16 words: Transformers for image recognition at scale, 2020.
- Learned step size quantization. In International Conference on Learning Representations (ICLR), 2020.
- The pascal visual object classes (voc) challenge. International Journal of Computer Vision, 88(2):303–338, June 2010.
- A Mathematical Introduction to Compressive Sensing. Applied and Numerical Harmonic Analysis. Springer New York, New York, NY, 2013.
- The lottery ticket hypothesis: Finding sparse, trainable neural networks. arXiv preprint arXiv:1803.03635, 2018.
- Stabilizing the lottery ticket hypothesis. arXiv preprint arXiv:1903.01611, 2019.
- Optimal brain compression: A framework for accurate post-training quantization and pruning. arXiv preprint arXiv:2208.11580, 2022.
- Sparsegpt: Massive language models can be accurately pruned in one-shot, 2023.
- The state of sparsity in deep neural networks. arXiv preprint arXiv:1902.09574, 2019.
- CVX: Matlab software for disciplined convex programming, version 2.1. http://cvxr.com/cvx, March 2014.
- Deep learning with limited numerical precision. In International Conference on Machine Learning, (ICML), 2015.
- Gurobi Optimization, LLC. Gurobi Optimizer Reference Manual, 2023.
- Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding. arXiv preprint arXiv:1510.00149, 2015.
- Learning both weights and connections for efficient neural network. Advances in neural information processing systems, 28, 2015.
- Optimal brain surgeon and general network pruning. In IEEE international conference on neural networks, pages 293–299. IEEE, 1993.
- Deep residual learning for image recognition. In Conference on Computer Vision and Pattern Recognition, 2016.
- Channel pruning for accelerating very deep neural networks. In Proceedings of the IEEE international conference on computer vision, pages 1389–1397, 2017.
- Searching for mobilenetv3. In International Conference on Computer Vision (ICCV), 2019.
- Opq: Compressing deep neural networks with one-shot pruning-quantization. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 35, pages 7780–7788, 2021.
- Accurate post training quantization with small calibration sets. In International Conference on Machine Learning, pages 4466–4475. PMLR, 2021.
- An empirical comparison of quantization, pruning and low-rank neural network compression using the lc toolkit. In 2021 International Joint Conference on Neural Networks (IJCNN), pages 1–8. IEEE, 2021.
- Quantization and training of neural networks for efficient integer-arithmetic-only inference. Conference on Computer Vision and Pattern Recognition (CVPR), 2018.
- Trained uniform quantization for accurate and efficient neural network inference on fixed-point hardware. arxiv preprint arxiv:1903.08066, 2019.
- Fp8 quantization: The power of the exponent. arXiv preprint arXiv:2208.09225, 2022.
- Optimal brain damage. Advances in neural information processing systems, 2, 1989.
- A signal propagation perspective for pruning neural networks at initialization. arXiv preprint arXiv:1906.06307, 2019.
- Snip: Single-shot network pruning based on connection sensitivity. arXiv preprint arXiv:1810.02340, 2018.
- Brecq: Pushing the limit of post-training quantization by block reconstruction. In International Conference on Learning Representations (ICLR), 2021.
- Fixed point quantization of deep convolutional networks. In International conference on machine learning, pages 2849–2858. PMLR, 2016.
- Fixed point quantization of deep convolutional networks. In International Conference on Machine Learning, 2016.
- Microsoft coco: Common objects in context. In David Fleet, Tomas Pajdla, Bernt Schiele, and Tinne Tuytelaars, editors, Computer Vision – ECCV 2014, pages 740–755, Cham, 2014. Springer International Publishing.
- Relaxed quantization for discretized neural networks. In International Conference on Learning Representations (ICLR), 2019.
- Learning sparse neural networks through l_0𝑙_0l\_0italic_l _ 0 regularization. arXiv preprint arXiv:1712.01312, 2017.
- Learning sparse neural networks through l0subscript𝑙0l_{0}italic_l start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT regularization. International Conference on Learning Representations (ICLR), 2018.
- Accelerating sparse deep neural networks. arXiv preprint arXiv:2104.08378, 2022.
- Up or down? Adaptive rounding for post-training quantization. In International Conference on Machine Learning (ICML), 2020.
- A white paper on neural network quantization. ArXiv, abs/2106.08295, 2021.
- Overcoming oscillations in quantization-aware training. In International Conference on Machine Learning, pages 16318–16330. PMLR, 2022.
- Data-free quantization through weight equalization and bias correction. In International Conference on Computer Vision (ICCV), 2019.
- Exploring sparsity in recurrent neural networks. arXiv preprint arXiv:1704.05119, 2017.
- A semidefinite programming method for integer convex quadratic minimization. Optimization Letters, 12:499–518, 2018.
- Mixed-integer quadratic programming is in np. Mathematical Programming, 162:225–240, 2017.
- ImageNet Large Scale Visual Recognition Challenge. International Journal of Computer Vision (IJCV), 2015.
- Mobilenetv2: Inverted residuals and linear bottlenecks. In Conference on Computer Vision and Pattern Recognition (CVPR), 2018.
- Cyclical pruning for sparse neural networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 2762–2771, 2022.
- EfficientNet: Rethinking model scaling for convolutional neural networks. In International Conference on Machine Learning (ICML), 2019.
- Efficientdet: Scalable and efficient object detection, 2020.
- Deep neural network compression by in-parallel pruning-quantization. IEEE transactions on pattern analysis and machine intelligence, 42(3):568–579, 2018.
- Mixed precision dnns: All you need is a good parametrization. In International Conference on Learning Representations (ICLR), 2020.
- Bayesian bits: Unifying quantization and pruning. arXiv preprint arXiv:2005.07093, 2020.
- Automatic neural network compression by sparsity-quantization joint learning: A constrained optimization-based approach. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 2178–2188, 2020.
- Nisp: Pruning networks using neuron importance score propagation. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 9194–9203, 2018.
- Accelerating very deep convolutional networks for classification and detection. IEEE transactions on pattern analysis and machine intelligence, 38(10):1943–1955, 2015.
- Dorefa-net: Training low bitwidth convolutional neural networks with low bitwidth gradients. arXiv preprint arXiv:1606.06160, 2016.
- To prune, or not to prune: exploring the efficacy of pruning for model compression. arXiv preprint arXiv:1710.01878, 2017.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.