Papers
Topics
Authors
Recent
Search
2000 character limit reached

DEER: Deep Runahead for Instruction Prefetching on Modern Mobile Workloads

Published 29 Apr 2025 in cs.PF and cs.AR | (2504.20387v1)

Abstract: Mobile workloads incur heavy frontend stalls due to increasingly large code footprints as well as long repeat cycles. Existing instruction-prefetching techniques suffer from low coverage, poor timeliness, or high cost. We provide a SW/HW co-designed I-prefetcher; DEER uses profile analysis to extract metadata information that allow the hardware to prefetch the most likely future instruction cachelines, hundreds of instructions earlier. This profile analysis skips over loops and recursions to go deeper into the future, and uses a return-address stack on the hardware side to allow prefetch on the return-path from large call-stacks. The produced metadata table is put in DRAM, pointed to by an in-hardware register; the high depth of the lookahead allows to preload the metadata in time and thus nearly no on-chip metadata storage is needed. Gem5 evaluation on real-world modern mobile workloads shows up to 45% reduction in L2 instruction-miss rate (19.6% on average), resulting in up to 8% speedup (4.7% on average). These gains are up to 4X larger than full-hardware record-and-replay prefetchers, while needing two orders of magnitude smaller on-chip storage.

Summary

No one has generated a summary of this paper yet.

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We found no open problems mentioned in this paper.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 1 tweet with 0 likes about this paper.