Papers
Topics
Authors
Recent
Search
2000 character limit reached

Draft on the Fly: Adaptive Self-Speculative Decoding using Cosine Similarity

Published 1 Oct 2024 in cs.CL | (2410.01028v1)

Abstract: We present a simple on the fly method for faster inference of LLMs. Unlike other (self-)speculative decoding techniques, our method does not require fine-tuning or black-box optimization to generate a fixed draft model, relying instead on simple rules to generate varying draft models adapted to the input context. We show empirically that our light-weight algorithm is competitive with the current SOTA for self-speculative decoding, while being a truly plug-and-play method.

Summary

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.