Split-path KV-Cache read scheduling across prefill and decode engines
Develop a read-splitting mechanism within DualPath’s KV-Cache read task scheduling that partitions a single request’s KV-Cache retrieval across both the prefill engine and the decode engine, rather than reading exclusively on the side with the shorter storage read queue, so as to utilize both storage NICs concurrently.
References
It is probably better to split the request into two parts and read them from both sides, and we leave it as future work.
— DualPath: Breaking the Storage Bandwidth Bottleneck in Agentic LLM Inference
(2602.21548 - Wu et al., 25 Feb 2026) in Section 6.1, Inter-Engine Scheduling, KV-Cache Read Task Scheduling