Joint design and trade-off space for composing SSD with EAGLE and token-tree speculation

Investigate the joint design and performance trade-offs of integrating Speculative Speculative Decoding (SSD) with EAGLE-style draft models and token-tree speculative decoding methods. Specifically, determine how to coordinate verification-outcome prediction (fan-out allocation), cache-aware sampling for residual control, and fallback strategies when SSD is composed with EAGLE and tree-based speculation so as to maximize end-to-end speedups while preserving the lossless correctness guarantees across batch sizes and sampling temperatures.

Background

The paper introduces Speculative Speculative Decoding (SSD), which parallelizes drafting and verification by precomputing speculations for likely verification outcomes while the target model verifies the prior round. Saguaro is an optimized SSD algorithm addressing prediction of verification outcomes, balancing acceptance vs. cache-hit rates via a novel sampling scheme, and handling cache misses.

Prior work such as EAGLE and tree-based speculative decoding methods (e.g., token trees) improve acceptance or offer multiple candidate paths but operate sequentially (speculate then verify). The authors note SSD can be combined with these approaches, but the optimal combined design—spanning cache construction, sampling, and fallback—has not been characterized. This motivates a systematic study of the joint design and trade-offs when composing SSD with EAGLE and token-tree speculation.

References

Much remains open. SSD composes naturally with EAGLE and token-tree speculation (Appendix\ref{app:combine}); the joint design and tradeoff space is largely unexplored.

Speculative Speculative Decoding  (2603.03251 - Kumar et al., 3 Mar 2026) in Conclusion and Limitations, Section 6