Papers
Topics
Authors
Recent
Search
2000 character limit reached

An exploration of the effect of quantisation on energy consumption and inference time of StarCoder2

Published 15 Nov 2024 in cs.CL, cs.AI, and cs.SE | (2411.12758v1)

Abstract: This study examines quantisation and pruning strategies to reduce energy consumption in code LLMs inference. Using StarCoder2, we observe increased energy demands with quantization due to lower throughput and some accuracy losses. Conversely, pruning reduces energy usage but impairs performance. The results highlight challenges and trade-offs in LLM model compression. We suggest future work on hardware-optimized quantization to enhance efficiency with minimal loss in accuracy.

Summary

No one has generated a summary of this paper yet.

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.