Runtime overhead of the FWHT encode/decode path

Measure the runtime overhead introduced by TurboAngle’s FWHT-based encode/decode path under realistic batch sizes and sequence lengths to quantify latency and throughput impacts during inference.

Background

TurboAngle’s pipeline applies a random ±1 diagonal rotation and the normalized FWHT during both encoding and decoding. While the transform is algorithmically efficient, the paper does not report end-to-end runtime costs in realistic serving scenarios.

Quantifying overhead is necessary for deployment decisions, especially where latency constraints are tight or batch/sequence sizes vary significantly.

References

Runtime overhead of the FWHT encode/decode path has not been measured under realistic batch and sequence sizes.

TurboAngle: Near-Lossless KV Cache Compression via Uniform Angle Quantization  (2603.27467 - Patel, 29 Mar 2026) in Conclusion — Limitations