Sufficiency of Compute in Processing-in-Memory under DRAM Process Constraints

Determine whether processing-in-memory (PIM) architectures fabricated in DRAM technology process nodes can provide sufficient compute capability given the very limited power and thermal budgets of memory dies, particularly in the context of large language model inference workloads.

Background

The paper contrasts Processing-in-Memory (PIM), which integrates compute logic on the same die as memory, with Processing-Near-Memory (PNM), which places compute logic on separate, nearby dies. While PIM can offer very high bandwidth and extremely low data movement power, the authors note significant practical constraints when placing logic in DRAM technology nodes, including limited power and thermal budgets, reduced logic performance and efficiency, and challenges for software sharding at fine memory-bank granularities.

In evaluating suitability for datacenter LLM inference, the authors highlight that PNM may be preferable due to fewer constraints on compute logic and easier software partitioning. However, they explicitly state uncertainty about whether PIM can provide enough compute under DRAM process limitations, making this a key unresolved question for hardware-software co-design targeting LLM inference.

References

It is also unclear if the compute can be sufficient in PIM given the very limited budget for power and thermal of a DRAM technology process node.

— Challenges and Research Directions for Large Language Model Inference Hardware (2601.05047 - Ma et al., 8 Jan 2026) in Section 2: Processing-Near-Memory for high bandwidth (preceding Table 4)

Sufficiency of Compute in Processing-in-Memory under DRAM Process Constraints

Background

References

Related Problems