Split-path KV-Cache read scheduling across prefill and decode engines

Develop a read-splitting mechanism within DualPath’s KV-Cache read task scheduling that partitions a single request’s KV-Cache retrieval across both the prefill engine and the decode engine, rather than reading exclusively on the side with the shorter storage read queue, so as to utilize both storage NICs concurrently.

Background

DualPath introduces two data paths for KV-Cache loading: a storage-to-prefill (PE read) path and a storage-to-decode (DE read) path. The request scheduler must decide how to leverage these paths to balance I/O and compute across engines and NICs.

In the current implementation, after selecting a prefill engine and a decode engine for a request, the system reads KV-Cache on the side with the shorter storage read queue. The authors explicitly note that splitting a request’s KV-Cache read across both sides may be better but defer this to future work, leaving a concrete unresolved scheduling problem.

References

It is probably better to split the request into two parts and read them from both sides, and we leave it as future work.

— DualPath: Breaking the Storage Bandwidth Bottleneck in Agentic LLM Inference (2602.21548 - Wu et al., 25 Feb 2026) in Section 6.1, Inter-Engine Scheduling, KV-Cache Read Task Scheduling

Split-path KV-Cache read scheduling across prefill and decode engines

Background

References

Related Problems