Papers
Topics
Authors
Recent
Search
2000 character limit reached

Quantize What Counts: Bit Allocation Insights Informed by Spectral Gaps in Keys and Values

Published 20 Feb 2025 in cs.LG | (2502.15075v2)

Abstract: LLMs have introduced significant advancements to the capabilities of NLP in recent years. However, as these models continue to scale in size, memory constraints pose substantial challenge. Key and Value cache (KV cache) quantization has been well-documented as a promising solution to this limitation. In this work, we provide two novel theorems aimed at enhancing KV quantization methods. Our first theorem, termed Key-Value Norm Disparity, states that the key weight matrices by nature carry richer information compared to the value weight matrices, as evidenced by higher spectral and Frobenius norms across most of the layers. Our second theorem, Key-Driven Quantization, posits that prioritizing the quantization precision of keys over values induces significant improvements to the overall quantization performance. In particular, assigning greater precision to the keys compared to the values achieves a higher degree of precision reduction with minimal impact on model accuracy. We validate these theorems through theory and extensive experiments on several state-of-the-art LLM architectures and benchmarks. These findings offer valuable guidelines for improving KV cache quantization strategies, facilitating more efficient memory utilization without compromising model performance across diverse NLP tasks. Source code is available at https://github.com/mohsenhariri/spectral-kv.

Summary

No one has generated a summary of this paper yet.

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 2 tweets with 5 likes about this paper.