Direct PIM Access: Secure Memory Computation
- Direct PIM Access (DPA) is a secure computational model that integrates PIM enclaves within memory for confidential, accelerated data processing.
- DPA employs cryptographic protocols such as AES-GCM encryption, arithmetic secret sharing, and remote attestation to safeguard data integrity and confidentiality.
- DPA architectures demonstrate significant performance benefits, achieving up to 14.66× speedups over secure CPU computations in real-world machine learning and analytics tasks.
PIM-Enclave is a secure computational framework that integrates confidential execution environments directly into the memory subsystem. By leveraging Processing-In-Memory (PIM) architectures, it aims to accelerate data-intensive workloads while ensuring robust confidentiality and integrity guarantees—circumventing classical CPU-centric side channels and bus-level leakage. The term ’enclave’ here denotes an isolated, attestable execution environment within either the PIM logic layers (Duy et al., 2021) or via cryptographically hardened offloading protocols when PIM modules themselves are untrusted (Ghinani et al., 28 Jan 2025). PIM-Enclave mechanisms have demonstrated negligible performance overhead relative to the PIM baseline and achieve up to speedups over secure CPU computation for real-world ML tasks.
1. Architectural Design and Hardware Integration
PIM-Enclave manifests in two architectural variants: (a) trusted, logic-layer-integrated PIM cores (Duy et al., 2021), and (b) secure software/hardware protocols atop untrusted PIM modules compatible with commodity DPUs (Ghinani et al., 28 Jan 2025). The core principle is minimizing data movement by merging computation and data residence in memory:
- Trusted Logic-Layer Variant: Each DRAM bank incorporates a low-power PIM core, local memory, and specialized blocks:
- Untrusted PIM Variant (Ghinani et al., 28 Jan 2025): Relies on the host CPU’s Trusted Execution Environment (TEE) for trust anchoring. Computation is split using arithmetic secret sharing and garbled circuits; the PIM modules only process masked/garbled data.
All variants ensure that neither host buses nor external interfaces expose plaintext or sensitive address patterns, and secure DMA or command/MMIO channels are established via remote attestation and session key exchange.
2. Security Models and Adversary Assumptions
The confidentiality model strictly assumes the following trust boundaries:
- Trusted Logic-Layer PIM (Duy et al., 2021): Only the PIM module’s logic, DRAM arrays, and local cores are trusted and tamper-resistant.
- Untrusted PIM with TEE (Ghinani et al., 28 Jan 2025): Root of trust is within the CPU TEE; all other memory (DRAM, DPUs, host buses) are adversarial.
Adversary Capabilities include:
- Full control over host software, DMA, MMIO, and DRAM buses;
- Physical probing of off-chip interfaces;
- Tampering and replay of untrusted memory contents.
Out-of-scope threats include host processor microarchitectural attacks unless mitigated separately, and side-channels internal to the CPU or within the PIM that do not traverse the protected boundary.
Key security mechanisms are summarized below:
| Defense Mechanism | Variant | Function |
|---|---|---|
| AES-GCM Encryption/Tags | Trusted PIM (Duy et al., 2021) | Confidentiality, integrity and freshness for in-bank data |
| Counter-Mode Encryption | Untrusted PIM (Ghinani et al., 28 Jan 2025) | Ensures only ciphertext transits off-chip memory/bus |
| Remote Attestation | Both | Session key agreement, mutual challenge–response |
| Arithmetic Secret Sharing | Untrusted PIM (Ghinani et al., 28 Jan 2025) | Data is split; neither share reveals secret alone |
| Garbled Circuits (GC) | Untrusted PIM (Ghinani et al., 28 Jan 2025) | Non-linear function security and oblivious evaluation |
Formally, in (Duy et al., 2021), confidentiality is stated: ; mutual information between the secret and observable bus/MMIO trace is zero.
3. Programming Model and Offloading Semantics
Programming for PIM-Enclave entails a two-part codebase:
- Host-Enclave Stub: Orchestrates remote attestation, session establishment, encrypted code/data loading, memory region protection (access-control toggle), offload of parameters, and kernel execution.
- PIM Kernel: Bare-metal kernel compiled for PIM ISA, receiving parameters, performing batched encrypted DMA access, processing in private local memory, and persisting results via re-encryption.
Example host workflow (from (Duy et al., 2021)):
1 2 3 4 5 6 7 8 9 10 |
pe = PIMEncInit(bank_index, module_id)
attest_device(pe)
establish_session(pe)
pe.load_binaries("kernel.elf.enc")
A_ptr = pe.alloc(size)
pe.load(A_ptr, src, size)
pe.protect(A_ptr, size)
pe.offload_params({A_ptr, ...})
pe.execute()
out = pe.get_output() |
1 2 3 4 |
get_params(&A_ptr, ...) dma_request(A_ptr, local_buf, BATCH, DECRYPT) compute(local_buf) dma_request(local_buf, A_ptr, BATCH, ENCRYPT) |
For MPC-based designs, secret sharing, offline precomputation, and garbled circuit modes are exposed at API level. Linear operations use arithmetic sharing; nonlinear stages invoke garbled-circuit-based kernels. Merge of CPU and PIM result shares is modular: (Ghinani et al., 28 Jan 2025).
4. Cryptographic Protocols and Integrity Verification
PIM-Enclave achieves confidentiality and integrity via layered cryptographic protocols:
- Data-at-Rest Protection (Duy et al., 2021): AES-GCM encrypts each DRAM line; integrity and freshness tags prevent cold-boot and replay.
- Arithmetic Secret Sharing and Counter-Mode Encryption (Ghinani et al., 28 Jan 2025): The TEE produces per-block OTPs and splits input data into masked shares; the PIM operates exclusively on encrypted shares.
- Garbled Circuits: For nonlinear functions (e.g., sigmoids, ReLUs), the TEE garbles Boolean circuits and transmits descriptions; the PIM evaluates with oblivious input labels.
- Integrity MACs: Linear modular hashes (e.g., ) are stored encrypted and verified post-decryption and share merge.
Correctness guarantees are enforced for both linear and nonlinear computations via share recombination and staged verification. Algorithms such as tag recomputation, verification stages, and circuit unwrapping draw on secure multiparty computation and TEE literature (e.g., SecNDP, Slalom, SecureML).
5. Performance Characteristics and Quantitative Results
PIM-Enclave’s impact is measured across encryption latency, bandwidth, execution overhead, and application speedup:
- Trusted PIM Variant ((Duy et al., 2021), gem5-based, 8 banks):
- AES-DMA incurs ≈1 cycle/16 bytes; overall DMA latency increases by 22.3%.
- Peak bandwidth declines by 17.85% (3.53 GB/s → 2.90 GB/s).
- Secure k-means clustering (20 iterations, 640 MB dataset) shows only 3.7% added runtime.
- With ≥6 banks, PIM-Enclave doubles throughput vs host-only execution for large datasets.
- Untrusted PIM MPC Variant ((Ghinani et al., 28 Jan 2025), UPMEM, 2,560 DPUs):
- MLP inference sees up to 14.66× speedup over secure CPU baseline for 40 KB inputs/layer.
- DLRM lookup: 9.80× speedup (24 GB tables).
- Logistic regression: 5.85× speedup at large sample counts.
- Linear regression: 2.64× over secure CPU at >800K samples.
- Full MPC security overhead (vs insecure PIM): <4% runtime penalty.
Offline precomputation optimization amortizes the CPU’s GEMV workload, achieving up to 92% reduction in linear algebra stages when model weights are static. Online latency drops, with CPU share becoming negligible (<8%) (Ghinani et al., 28 Jan 2025).
6. Application Domains and Limitations
PIM-Enclave demonstrates optimal performance and security for:
- Large-scale data analytics (map-reduce, clustering, sorting)
- Machine-learning primitives (MLP inference, DLRM, logistic/linear regression)
- Graph algorithms with irregular access patterns
Advantages are maximal when bulk kernel execution eliminates bus traffic and exploits high internal DRAM bandwidth via TSVs and bank concurrency.
Limitations:
- SPMD-only model; coordination among banks requires host mediation.
- PIM cores are in-order, cacheless, and power-constrained; not suited for control-flow–intensive or FP-heavy workloads.
- Limited host memory availability while bank protection is engaged.
- MPC-based offloading: Precomputation only applies when public weights are static. No defense against timing/power side-channels or colluding TEEs.
- Verification for nonlinear computations necessitates two-stage checks.
A plausible implication is that future extensions could further mitigate host CPU side-channels (e.g., via ORAM) or provide on-PIM inter-bank networks.
7. Significance and Prospective Directions
PIM-Enclave establishes a concrete template for confidential computing by tightly integrating secure enclaves into the memory architecture, achieving theoretically zero address-pattern leakage and high-throughput for large datasets (). Its cryptographic and architectural resilience are validated by low overheads and strong empirical speedups on both simulated and real platforms.
The duality of trusted logic-layer PIM and MPC-hardened offloading to untrusted DPUs accommodates a range of deployment models, from custom cloud infrastructure to commodity PIM systems. This suggests prospective research directions around extending SPMD coordination, addressing microarchitectural and physical side-channels, and generalizing enclave support to broader computational primitives.
References:
- “PIM-Enclave: Bringing Confidential Computation Inside Memory” (Duy et al., 2021)
- “Enabling Low-Cost Secure Computing on Untrusted In-Memory Architectures” (Ghinani et al., 28 Jan 2025)