Papers
Topics
Authors
Recent
Search
2000 character limit reached

Direct PIM Access: Secure Memory Computation

Updated 30 January 2026
  • Direct PIM Access (DPA) is a secure computational model that integrates PIM enclaves within memory for confidential, accelerated data processing.
  • DPA employs cryptographic protocols such as AES-GCM encryption, arithmetic secret sharing, and remote attestation to safeguard data integrity and confidentiality.
  • DPA architectures demonstrate significant performance benefits, achieving up to 14.66× speedups over secure CPU computations in real-world machine learning and analytics tasks.

PIM-Enclave is a secure computational framework that integrates confidential execution environments directly into the memory subsystem. By leveraging Processing-In-Memory (PIM) architectures, it aims to accelerate data-intensive workloads while ensuring robust confidentiality and integrity guarantees—circumventing classical CPU-centric side channels and bus-level leakage. The term ’enclave’ here denotes an isolated, attestable execution environment within either the PIM logic layers (Duy et al., 2021) or via cryptographically hardened offloading protocols when PIM modules themselves are untrusted (Ghinani et al., 28 Jan 2025). PIM-Enclave mechanisms have demonstrated negligible performance overhead relative to the PIM baseline and achieve up to 14.66×14.66\times speedups over secure CPU computation for real-world ML tasks.

1. Architectural Design and Hardware Integration

PIM-Enclave manifests in two architectural variants: (a) trusted, logic-layer-integrated PIM cores (Duy et al., 2021), and (b) secure software/hardware protocols atop untrusted PIM modules compatible with commodity DPUs (Ghinani et al., 28 Jan 2025). The core principle is minimizing data movement by merging computation and data residence in memory:

  • Trusted Logic-Layer Variant: Each DRAM bank incorporates a low-power PIM core, local memory, and specialized blocks:
    • Key & ROM Storage: Houses endorsement keys and immutable attestation images.
    • AES-GCM DMA Engines: Encrypt, decrypt, and authenticate all DMA transfers between DRAM and scratchpads.
    • Access-Control Logic: Locks protected address ranges, blocking host-side reads/writes during sensitive computations.
  • Untrusted PIM Variant (Ghinani et al., 28 Jan 2025): Relies on the host CPU’s Trusted Execution Environment (TEE) for trust anchoring. Computation is split using arithmetic secret sharing and garbled circuits; the PIM modules only process masked/garbled data.

All variants ensure that neither host buses nor external interfaces expose plaintext or sensitive address patterns, and secure DMA or command/MMIO channels are established via remote attestation and session key exchange.

2. Security Models and Adversary Assumptions

The confidentiality model strictly assumes the following trust boundaries:

  • Trusted Logic-Layer PIM (Duy et al., 2021): Only the PIM module’s logic, DRAM arrays, and local cores are trusted and tamper-resistant.
  • Untrusted PIM with TEE (Ghinani et al., 28 Jan 2025): Root of trust is within the CPU TEE; all other memory (DRAM, DPUs, host buses) are adversarial.

Adversary Capabilities include:

  • Full control over host software, DMA, MMIO, and DRAM buses;
  • Physical probing of off-chip interfaces;
  • Tampering and replay of untrusted memory contents.

Out-of-scope threats include host processor microarchitectural attacks unless mitigated separately, and side-channels internal to the CPU or within the PIM that do not traverse the protected boundary.

Key security mechanisms are summarized below:

Defense Mechanism Variant Function
AES-GCM Encryption/Tags Trusted PIM (Duy et al., 2021) Confidentiality, integrity and freshness for in-bank data
Counter-Mode Encryption Untrusted PIM (Ghinani et al., 28 Jan 2025) Ensures only ciphertext transits off-chip memory/bus
Remote Attestation Both Session key agreement, mutual challenge–response
Arithmetic Secret Sharing Untrusted PIM (Ghinani et al., 28 Jan 2025) Data is split; neither share reveals secret alone
Garbled Circuits (GC) Untrusted PIM (Ghinani et al., 28 Jan 2025) Non-linear function security and oblivious evaluation

Formally, in (Duy et al., 2021), confidentiality is stated: I(S;O)=0I(S; O) = 0; mutual information between the secret SS and observable bus/MMIO trace OO is zero.

3. Programming Model and Offloading Semantics

Programming for PIM-Enclave entails a two-part codebase:

  • Host-Enclave Stub: Orchestrates remote attestation, session establishment, encrypted code/data loading, memory region protection (access-control toggle), offload of parameters, and kernel execution.
  • PIM Kernel: Bare-metal kernel compiled for PIM ISA, receiving parameters, performing batched encrypted DMA access, processing in private local memory, and persisting results via re-encryption.

Example host workflow (from (Duy et al., 2021)):

1
2
3
4
5
6
7
8
9
10
pe = PIMEncInit(bank_index, module_id)
attest_device(pe)
establish_session(pe)
pe.load_binaries("kernel.elf.enc")
A_ptr = pe.alloc(size)
pe.load(A_ptr, src, size)
pe.protect(A_ptr, size)
pe.offload_params({A_ptr, ...})
pe.execute()
out = pe.get_output()
On the PIM core:
1
2
3
4
get_params(&A_ptr, ...)
dma_request(A_ptr, local_buf, BATCH, DECRYPT)
compute(local_buf)
dma_request(local_buf, A_ptr, BATCH, ENCRYPT)
API primitives (allocation, protection, offload, execution) manipulate access-control, encryption, and DMA channels transparently.

For MPC-based designs, secret sharing, offline precomputation, and garbled circuit modes are exposed at API level. Linear operations use arithmetic sharing; nonlinear stages invoke garbled-circuit-based kernels. Merge of CPU and PIM result shares is modular: res=(resPIM+resCPU)(modp)res = (res_{\mathrm{PIM}} + res_{\mathrm{CPU}})\pmod p (Ghinani et al., 28 Jan 2025).

4. Cryptographic Protocols and Integrity Verification

PIM-Enclave achieves confidentiality and integrity via layered cryptographic protocols:

  • Data-at-Rest Protection (Duy et al., 2021): AES-GCM encrypts each DRAM line; integrity and freshness tags prevent cold-boot and replay.
  • Arithmetic Secret Sharing and Counter-Mode Encryption (Ghinani et al., 28 Jan 2025): The TEE produces per-block OTPs and splits input data into masked shares; the PIM operates exclusively on encrypted shares.
  • Garbled Circuits: For nonlinear functions (e.g., sigmoids, ReLUs), the TEE garbles Boolean circuits and transmits descriptions; the PIM evaluates with oblivious input labels.
  • Integrity MACs: Linear modular hashes (e.g., Tagj=(iPi,jsmi)modqTag_j = \Bigl(\sum_{i} P_{i,j}\,s^{\,m-i}\Bigr)\bmod q) are stored encrypted and verified post-decryption and share merge.

Correctness guarantees are enforced for both linear and nonlinear computations via share recombination and staged verification. Algorithms such as tag recomputation, verification stages, and circuit unwrapping draw on secure multiparty computation and TEE literature (e.g., SecNDP, Slalom, SecureML).

5. Performance Characteristics and Quantitative Results

PIM-Enclave’s impact is measured across encryption latency, bandwidth, execution overhead, and application speedup:

  • Trusted PIM Variant ((Duy et al., 2021), gem5-based, 8 banks):
    • AES-DMA incurs ≈1 cycle/16 bytes; overall DMA latency increases by 22.3%.
    • Peak bandwidth declines by 17.85% (3.53 GB/s → 2.90 GB/s).
    • Secure k-means clustering (20 iterations, 640 MB dataset) shows only 3.7% added runtime.
    • With ≥6 banks, PIM-Enclave doubles throughput vs host-only execution for large datasets.
  • Untrusted PIM MPC Variant ((Ghinani et al., 28 Jan 2025), UPMEM, 2,560 DPUs):
    • MLP inference sees up to 14.66× speedup over secure CPU baseline for 40 KB inputs/layer.
    • DLRM lookup: 9.80× speedup (24 GB tables).
    • Logistic regression: 5.85× speedup at large sample counts.
    • Linear regression: 2.64× over secure CPU at >800K samples.
    • Full MPC security overhead (vs insecure PIM): <4% runtime penalty.

Offline precomputation optimization amortizes the CPU’s GEMV workload, achieving up to 92% reduction in linear algebra stages when model weights are static. Online latency drops, with CPU share becoming negligible (<8%) (Ghinani et al., 28 Jan 2025).

6. Application Domains and Limitations

PIM-Enclave demonstrates optimal performance and security for:

  • Large-scale data analytics (map-reduce, clustering, sorting)
  • Machine-learning primitives (MLP inference, DLRM, logistic/linear regression)
  • Graph algorithms with irregular access patterns

Advantages are maximal when bulk kernel execution eliminates bus traffic and exploits high internal DRAM bandwidth via TSVs and bank concurrency.

Limitations:

  • SPMD-only model; coordination among banks requires host mediation.
  • PIM cores are in-order, cacheless, and power-constrained; not suited for control-flow–intensive or FP-heavy workloads.
  • Limited host memory availability while bank protection is engaged.
  • MPC-based offloading: Precomputation only applies when public weights are static. No defense against timing/power side-channels or colluding TEEs.
  • Verification for nonlinear computations necessitates two-stage checks.

A plausible implication is that future extensions could further mitigate host CPU side-channels (e.g., via ORAM) or provide on-PIM inter-bank networks.

7. Significance and Prospective Directions

PIM-Enclave establishes a concrete template for confidential computing by tightly integrating secure enclaves into the memory architecture, achieving theoretically zero address-pattern leakage and high-throughput for large datasets (I(S;O)=0I(S; O) = 0). Its cryptographic and architectural resilience are validated by low overheads and strong empirical speedups on both simulated and real platforms.

The duality of trusted logic-layer PIM and MPC-hardened offloading to untrusted DPUs accommodates a range of deployment models, from custom cloud infrastructure to commodity PIM systems. This suggests prospective research directions around extending SPMD coordination, addressing microarchitectural and physical side-channels, and generalizing enclave support to broader computational primitives.

References:

Definition Search Book Streamline Icon: https://streamlinehq.com
References (2)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Direct PIM Access (DPA).