Downstream tasks and long-context evaluation

Evaluate the impact of TurboAngle on downstream task accuracy and on long-context benchmarks such as LongBench, beyond perplexity on WikiText-2, to characterize performance in broader and more realistic application settings.

Background

The paper’s evaluations focus on perplexity for WikiText-2 and do not include downstream tasks or long-context scenarios. As a result, the generalization of observed perplexity benefits to practical tasks and very long contexts is unknown.

Assessing TurboAngle on downstream tasks and long-context benchmarks would determine whether near-lossless perplexity translates into preserved task performance and stable behavior at long sequence lengths.

References

We evaluate perplexity on WikiText-2 only; downstream task accuracy and long-context benchmarks (e.g., LongBench) remain untested.

TurboAngle: Near-Lossless KV Cache Compression via Uniform Angle Quantization  (2603.27467 - Patel, 29 Mar 2026) in Conclusion — Limitations