2000 character limit reached
An exploration of the effect of quantisation on energy consumption and inference time of StarCoder2
Published 15 Nov 2024 in cs.CL, cs.AI, and cs.SE | (2411.12758v1)
Abstract: This study examines quantisation and pruning strategies to reduce energy consumption in code LLMs inference. Using StarCoder2, we observe increased energy demands with quantization due to lower throughput and some accuracy losses. Conversely, pruning reduces energy usage but impairs performance. The results highlight challenges and trade-offs in LLM model compression. We suggest future work on hardware-optimized quantization to enhance efficiency with minimal loss in accuracy.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.