Perfect synchronization between draft and target models in speculative decoding
Develop methods to achieve perfect synchronization between independent draft language models and target language models during speculative decoding, ensuring that the draft model’s generated token sequences remain fully aligned with the target model’s verification to maintain accuracy and efficiency.
References
Methods such as knowledge distillation and online adaptation have been proposed to enhance this alignment, though perfect synchronization remains an open challenge.
— Small Language Models (SLMs) Can Still Pack a Punch: A survey
(2501.05465 - Subramanian et al., 3 Jan 2025) in Section “SLMs as Draft models” (Subsection under “Approaches to Create SLMs”)