Differentiable Search for Finding Optimal Quantization Strategy

Published 10 Apr 2024 in cs.LG and eess.IV | (2404.08010v2)

Abstract: To accelerate and compress deep neural networks (DNNs), many network quantization algorithms have been proposed. Although the quantization strategy of any algorithm from the state-of-the-arts may outperform others in some network architectures, it is hard to prove the strategy is always better than others, and even cannot judge that the strategy is always the best choice for all layers in a network. In other words, existing quantization algorithms are suboptimal as they ignore the different characteristics of different layers and quantize all layers by a uniform quantization strategy. To solve the issue, in this paper, we propose a differentiable quantization strategy search (DQSS) to assign optimal quantization strategy for individual layer by taking advantages of the benefits of different quantization algorithms. Specifically, we formulate DQSS as a differentiable neural architecture search problem and adopt an efficient convolution to efficiently explore the mixed quantization strategies from a global perspective by gradient-based optimization. We conduct DQSS for post-training quantization to enable their performance to be comparable with that in full precision models. We also employ DQSS in quantization-aware training for further validating the effectiveness of DQSS. To circumvent the expensive optimization cost when employing DQSS in quantization-aware training, we update the hyper-parameters and the network parameters in a single forward-backward pass. Besides, we adjust the optimization process to avoid the potential under-fitting problem. Comprehensive experiments on high level computer vision task, i.e., image classification, and low level computer vision task, i.e., image super-resolution, with various network architectures show that DQSS could outperform the state-of-the-arts.

Abstract PDF HTML Upgrade to Chat

References (52)

Citations (1)

View on Semantic Scholar

Summary

The paper’s main contribution is the DQSS framework that uses gradient-based optimization to tailor quantization for each network layer.
It introduces an efficient convolution mechanism that reduces computational complexity while enabling mixed quantization strategies across diverse architectures.
Experiments on image classification and super-resolution tasks demonstrate that DQSS achieves accuracy comparable to or better than full precision models.

Towards Optimal Layer-wise Quantization Strategy: A Differentiable Approach

Introduction

The pursuit of compressing and accelerating deep neural networks (DNNs) for efficient deployment has led to various techniques, among which network quantization has emerged as a compelling approach. By reducing the precision of the network's weights and activations, quantization offers a pathway to diminishing model size and speeding up inference, catering to the constraints of resource-limited platforms. However, existing quantization practices universally apply a single strategy across all network layers, disregarding the distinct sensitivities and contributions of individual layers to the overall network performance. This paper introduces a novel Differentiable Quantization Strategy Search (DQSS) framework that addresses this limitation by autonomously determining an optimal quantization strategy for each layer.

Differentiable Quantization Strategy Search (DQSS)

DQSS is grounded on the realization that different layers within a network may respond differently to quantization, a factor that uniform quantization strategies fail to capitalize on. By formulating the search for an optimal quantization strategy as a differentiable problem akin to neural architecture search, DQSS leverages gradient-based optimization to explore a continuous space of quantization configurations. This approach facilitates the identification of layer-specific strategies from a predefined set of quantization algorithms, thereby tailoring the quantization process to the unique characteristics of each layer.

Core Contributions

The primary innovation of DQSS lies in its method for exploring mixed quantization strategies through a gradient-based method. By treating the search for optimal quantization as a differentiable problem, DQSS marks a distinctive shift from conventional, heuristic-driven approaches.
Introduction of an efficient convolution mechanism significantly reduces the computational complexity of exploring mixed strategies, paving the way for its application across various network architectures without incurring prohibitive computational costs.
DQSS extends its applicability beyond post-training quantization (PTQ) by incorporating it into quantization-aware training (QAT), demonstrating its versatility and effectiveness in enhancing model performance under quantization.
Comprehensive experiments across tasks of varying complexities underscore DQSS's superiority over state-of-the-art quantization methods. Notably, DQSS not only competes closely with full precision (FP32) models but in certain cases, surpasses their performance.

Experimental Validation

Evaluating on high-level computer vision tasks (image classification) and low-level tasks (image super-resolution) with a variety of network architectures, DQSS consistently demonstrated its ability to outperform conventional quantization approaches. This is particularly evident in scenarios involving PTQ, where DQSS showcased remarkable proficiency in retaining, and occasionally improving, the accuracy of quantized models relative to their FP32 counterparts. Additionally, the application of DQSS in QAT further validated its effectiveness, showcasing notable improvements over leading QAT methods, particularly in challenging network architectures such as MobileNet-V2.

Ablation Studies and Observations

Ablation studies provide insights into DQSS's operational dynamics, illustrating how different quantization strategies are selected for activations and weights across various layers. This intricately tailored approach is key to its success, allowing DQSS to leverage the strengths of diverse quantization algorithms according to the specific demands of each layer. Furthermore, the correlation between the performance improvements brought about by DQSS and the computational efficiencies realized through its efficient convolution mechanism highlights the framework's architectural ingenuity.

Future Directions

The groundwork laid by DQSS opens several avenues for future exploration. Incorporating a broader array of quantization algorithms into DQSS's search space could further augment its ability to fine-tune quantization strategies to the idiosyncrasies of each layer. Additionally, extending its application to a wider spectrum of tasks beyond the realms of image classification and super-resolution would offer a more comprehensive understanding of its versatility and limitations.

Conclusion

DQSS represents a significant advance in the domain of network quantization. By moving beyond the constraints of uniform quantization strategies, it introduces a sophisticated framework capable of optimizing layer-wise quantization in a principled and automated manner. The demonstrated efficacy of DQSS across diverse tasks and architectures not only underscores its immediate utility but also sets the stage for its evolution into an indispensable tool in the optimization of DNNs for resource-constrained environments.

Markdown Report Issue