Task-specific Compression for Multi-task Language Models using Attribution-based Pruning

Published 9 May 2022 in cs.CL and cs.AI | (2205.04157v2)

Abstract: Multi-task LLMs show outstanding performance for various natural language understanding tasks with only a single model. However, these LLMs utilize an unnecessarily large number of model parameters, even when used only for a specific task. This paper proposes a novel training-free compression method for multi-task LLMs using a pruning method. Specifically, we use an attribution method to determine which neurons are essential for performing a specific task. We task-specifically prune unimportant neurons and leave only task-specific parameters. Furthermore, we extend our method to be applicable in low-resource and unsupervised settings. Since our compression method is training-free, it uses few computing resources and does not destroy the pre-trained knowledge of LLMs. Experimental results on the six widely-used datasets show that our proposed pruning method significantly outperforms baseline pruning methods. In addition, we demonstrate that our method preserves performance even in an unseen domain setting.