Papers
Topics
Authors
Recent
Search
2000 character limit reached

Analog CiM (ACIM): Efficient DNN Acceleration

Updated 8 February 2026
  • Analog CiM (ACIM) is a class of in-memory computing architectures that perform MAC operations through analog primitives such as current summing, time delays, and charge accumulation.
  • ACIM implementations, including current-, time-, and charge-based methods, achieve impressive energy (50–400 TOPS/W) and area efficiency for medium-precision DNN inference.
  • Hybrid analog-digital designs and noise-aware training strategies improve robustness and balance the trade-offs between precision and energy in advanced DNN accelerators.

Analog Compute-in-Memory (ACIM) is a class of in-memory computing architectures that perform the multiply-accumulate (MAC) operation entirely in the analog domain by mapping digital inputs and weights stored in on-chip memory (usually SRAM or non-volatile memory) into analog voltages, currents, or charges. ACIM replaces conventional digital adders and multiplexers with analog accumulators, enabling superior area and energy efficiency, particularly for medium-precision deep neural network (DNN) inference workloads. Compared to digital compute-in-memory (DCIM), ACIM offers a trade-off: reduced computational precision (generally 3–8 bits) in exchange for up to an order-of-magnitude improvements in TOPS/W and area efficiency for DNN MAC operations (Yoshioka et al., 2024).

1. Operational Principles and Fundamental Categories

ACIM operates by directly exploiting analog circuit primitives—current summing, charge redistribution, or timing delays—to compute y=i=1nINiWiy = \sum_{i=1}^n \mathrm{IN}_i \cdot W_i on memory crossbar arrays. ACIM implementations are classically divided into three architectural categories (Yoshioka et al., 2024):

  • Current-Based ACIM: Digitally encoded input is applied as an analog voltage (or pulse), modulating a pass-gate or transconductance element parameterized by the stored weight, such that each bitcell sources a current Ii=gm(Wi)VINiI_i = g_m(W_i) V_{\mathrm{IN}_i}. All IiI_i sum on a shared bitline, with the total analog current digitized by a sense amplifier or ADC. This approach achieves excellent area and power efficiency (bitcell as small as 7T), but is limited by I–V nonlinearity and sensitivity to process, voltage, and temperature (PVT) variation, constraining precision to approx. 4–6 bits.
  • Time-Based ACIM: Each input-weight pair is mapped to an analog delay DiD_i (via pulse gating or voltage-to-delay conversion). Shared time-domain accumulators (e.g., OR-trees) combine these delays, with a TDC digitizing the total. This method is highly compact and compatible with advanced technology scaling, but exhibits nonlinear behavior and PVT sensitivity, again limiting accuracy to 4–6 bits.
  • Charge-Based ACIM: Employs bit cells composed of SRAM plus a metal-oxide-metal (MOM) capacitor, enabling highly linear, PVT-stable charge storage and transfer. Operations accumulate charges proportional to Qi=CVINiWiQ_i = C \cdot V_{\mathrm{IN}_i} \cdot W_i onto a shared column node. The final analog voltage is digitized via a SAR ADC—providing robust analog accumulation, high linearity, and the potential for substantially higher precision (up to 10–12 bits with advanced ADCs), but at the cost of increased ADC circuit area and energy. Charge-based ACIM currently dominates state-of-the-art precision-per-energy tradeoffs for DNN workloads.

2. Performance Metrics, Advantages, and Limitations

ACIM implementations are benchmarked by TOPS/W (trillion operations per second per Watt), area efficiency (GOPS/mm2^2), computational precision, and compute SNR (CSNR):

  • Efficiency and Throughput: Current, time, and charge-based ACIMs sustain 50–400 TOPS/W and >>500 GOPS/mm2^2 in leading-edge SRAM macros, exceeding DCIM (20–90 TOPS/W, 100–200 GOPS/mm2^2) at medium-precision bit-depths.
  • Analog Precision: DCIM achieves >>10b determinism, while ACIM’s practical precision ranges 3–8b (current/time) to 6–10b (charge-based with advanced SAR ADCs). Transformers require CSNR >>30 dB; CNNs can tolerate 15–30 dB (Yoshioka et al., 2024). Analog noise floors, PVT, IR-drop, and device mismatch limit achievable resolution.
  • Energy and Area: Charge-based ACIM delivers 5–20 fJ/MAC (energy dominated by SAR ADC), current/time-based ACIM reach lower fJ/MAC in lower-precision designs. Area per cell varies (6T+cap for charge-based, 7T for current-based, delay circuits for time-based).

Limitations: Analog accumulation is susceptible to non-idealities—ADC quantization noise, DAC nonlinearity, device mismatch, thermal drift, retention degradation, and IR-drop. These introduce stochastic and systematic computation errors, constraining precision and requiring error-robust training and architectural compensation techniques (Yoshioka et al., 2024, Feng et al., 16 Aug 2025).

3. Detailed Circuit Mechanisms and Mathematical Models

Current-Based ACIM:

  • Compute: Ii=gmWiVINiI_i = g_m \cdot W_i \cdot V_{\mathrm{IN}_i}
  • Accumulation: IBL=iIiI_{\mathrm{BL}} = \sum_i I_i
  • Readout: Sense amplifier/ADC

Time-Based ACIM:

  • Compute: Di=kDINiWiD_i = k_D \cdot \mathrm{IN}_i \cdot W_i
  • Accumulation: Ttotal=f1(iDi)T_\mathrm{total} = f^{-1}(\sum_i D_i)
  • Readout: Time-to-digital converter

Charge-Based ACIM:

  • Compute/accumulate:
    • Qi=CVpreWiQ_i = C \cdot V_\mathrm{pre} \cdot W_i
    • Qtotal=iQi(ΔVINi/Vpre)Q_\mathrm{total} = \sum_i Q_i \cdot (\Delta V_{\mathrm{IN}_i}/V_\mathrm{pre})
    • Vcol=Qtotal/(iC)V_\mathrm{col} = Q_\mathrm{total}/(\sum_i C)
  • Readout: SAR ADC digitizes VcolV_\mathrm{col}, precision set by ADC ENOB. PVT-robustness stems from MOM cap variation <1%<1\%.

Precision Metric: Compute-SNR (CSNR): CSNR=10log10(E[yD2]E[(yDyA)2])\mathrm{CSNR} = 10\log_{10}\left( \frac{\mathrm{E}[y_D^2]}{\mathrm{E}[(y_D-y_A)^2]} \right) with yDy_D (ideal digital MAC), yAy_A (analog result).

4. Hybrid Analog-Digital CIM Architectures

Hybrid CIM architectures partition the MAC bits so that MSBs are computed digitally (high precision, deterministic), and LSBs analogly (efficient, low-precision), realizing fine-grained energy/precision tradeoffs and extending achievable precision beyond the analog noise floor (Yoshioka et al., 2024):

  • MSB/LSB Split: Several split architectures implement nn-bit MAC with upper kk bits in DCIM and lower nkn-k bits in ACIM, achieving up to 2×\times energy reduction without accuracy loss (e.g., 4b ADC for ACIM LSBs).
  • Saliency-Aware Boundaries: Adaptive schemes dynamically shift the digital/analog boundary BD/AB_{D/A} (per MAC) according to data saliency (importance), e.g., On-the-Fly Saliency Evaluator (OSE) for DNN vision tasks (yielding 20–30% power savings at <<0.5% accuracy degradation) [(Yoshioka et al., 2024); see also (Chen et al., 2023)].
  • Rationale: Most DNNs can tolerate variable precision, reserving high-CSNR digital compute for critical front-end/high-saliency features while deploying analog compute for error-tolerant portions of the network.

Charge-based ACIM can be integrated tightly for LSB computation. Hybrid designs achieve state-of-the-art energy efficiency (>>300 TOPS/W) while realizing virtually full DNN inference accuracy.

5. Design Trade-offs: Noise, Precision, and ADCs

Analog noise and quantization set practical limits for ACIM. Charge-based designs, although highly linear, require high-resolution SAR ADCs whose area and energy scale exponentially with ENOB. For DNN/CNN, medium precision (6–8b ADC) yields optimal energy-accuracy balance; Transformers and other precision-sensitive models may require higher ENOB or hybrid architectures for MSBs.

Extensive evaluations highlight that, in ACIM, energy per MAC is minimized at 3–8 bits, aligning with the “quantized DNN” workload domain (Yoshioka et al., 2024). Increasing ENOB beyond \sim8b rapidly increases ADC area and energy, often overshooting DNN precision requirements. The leading ACIM macros now demonstrate:

  • Energy efficiency: 50–400 TOPS/W (macro level)
  • Precision: 3–12 bits, with 6–8 bits optimal for most vision DNNs; up to 10–12 bits with advanced ADCs in 3 nm
  • Throughput: >>500 GOPS/mm2^2 in charge-based bit-parallel implementations
  • Robustness: CSNR \approx 15–30 dB (CNN), >>30 dB (Transformer). PVT-hardened metal capacitors (charge-based) yield cap variation <<1\%

Papers report that DNN accuracy degrades rapidly when CSNR or analog noise exceeds these thresholds, especially for high-complexity networks (e.g., ViTs) (Yoshioka et al., 2024).

6. Robustness, Noise-Aware Training, and Future Directions

Robustness to analog non-idealities is critical for ACIM circuit deployment. As analog noise and device variation are unavoidable, algorithm-hardware co-design strategies are essential:

  • Noise-Aware Training: Techniques such as gradient straight-through estimation (STE) decouple complex analog noise simulation during the forward pass from tractable surrogate gradients during backpropagation, effectively training models robust to non-differentiable, non-Gaussian hardware noise (Feng et al., 16 Aug 2025). This yields substantial accuracy resilience and up to 2.2×\times speedup and 37.9% less memory in training over full-gradient noise simulation.
  • Variation-Aware Methods: Systematically injecting process/mismatch distributions into training and defining precision/CSNR-aware boundaries ensures BNNs and quantized DNNs retain accuracy under PVT and mismatch (Le et al., 2021).
  • Hybrid/Adaptive Precision: Runtime-dynamically shifting between analog and digital computation according to model demand, saliency, and noise budget, supports high energy efficiency without compromising application accuracy (Yoshioka et al., 2024).
  • PVT and EDA Automation: Designs increasingly target PVT-insensitivity through pure charge-domain architectures and are now amenable to end-to-end automated design and DSE (Chen et al., 2024, Zhang et al., 2024).

7. Outlook and Significance

Analog CIM (ACIM) has emerged as a dominant architecture for area- and energy-efficient DNN acceleration, especially suitable for quantized and medium-precision (3–8b) models. Cutting-edge charge-based ACIMs are already deploying in vision and language accelerators, delivering >>50–400 TOPS/W, >>500 TOPS/mm2^2, and robust operation across PVT corners. Progress in hybrid ACIM-DCIM, adaptive/noise-aware training, and PVT-hardened analog design continues to push practical accuracy, robustness, and system scaling.

While the trade-off space between precision, energy, and area remains, ACIM's ability to deliver dramatic efficiency gains in the DNN "sweet spot" ensures its relevance for edge AI, IoT, and data-center accelerators targeting quantized models (Yoshioka et al., 2024).


Major Reference:

A Review of SRAM-based Compute-in-Memory Circuits (Yoshioka et al., 2024).

Topic to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Analog CiM (ACIM).