Analog CiM (ACIM): Efficient DNN Acceleration
- Analog CiM (ACIM) is a class of in-memory computing architectures that perform MAC operations through analog primitives such as current summing, time delays, and charge accumulation.
- ACIM implementations, including current-, time-, and charge-based methods, achieve impressive energy (50–400 TOPS/W) and area efficiency for medium-precision DNN inference.
- Hybrid analog-digital designs and noise-aware training strategies improve robustness and balance the trade-offs between precision and energy in advanced DNN accelerators.
Analog Compute-in-Memory (ACIM) is a class of in-memory computing architectures that perform the multiply-accumulate (MAC) operation entirely in the analog domain by mapping digital inputs and weights stored in on-chip memory (usually SRAM or non-volatile memory) into analog voltages, currents, or charges. ACIM replaces conventional digital adders and multiplexers with analog accumulators, enabling superior area and energy efficiency, particularly for medium-precision deep neural network (DNN) inference workloads. Compared to digital compute-in-memory (DCIM), ACIM offers a trade-off: reduced computational precision (generally 3–8 bits) in exchange for up to an order-of-magnitude improvements in TOPS/W and area efficiency for DNN MAC operations (Yoshioka et al., 2024).
1. Operational Principles and Fundamental Categories
ACIM operates by directly exploiting analog circuit primitives—current summing, charge redistribution, or timing delays—to compute on memory crossbar arrays. ACIM implementations are classically divided into three architectural categories (Yoshioka et al., 2024):
- Current-Based ACIM: Digitally encoded input is applied as an analog voltage (or pulse), modulating a pass-gate or transconductance element parameterized by the stored weight, such that each bitcell sources a current . All sum on a shared bitline, with the total analog current digitized by a sense amplifier or ADC. This approach achieves excellent area and power efficiency (bitcell as small as 7T), but is limited by I–V nonlinearity and sensitivity to process, voltage, and temperature (PVT) variation, constraining precision to approx. 4–6 bits.
- Time-Based ACIM: Each input-weight pair is mapped to an analog delay (via pulse gating or voltage-to-delay conversion). Shared time-domain accumulators (e.g., OR-trees) combine these delays, with a TDC digitizing the total. This method is highly compact and compatible with advanced technology scaling, but exhibits nonlinear behavior and PVT sensitivity, again limiting accuracy to 4–6 bits.
- Charge-Based ACIM: Employs bit cells composed of SRAM plus a metal-oxide-metal (MOM) capacitor, enabling highly linear, PVT-stable charge storage and transfer. Operations accumulate charges proportional to onto a shared column node. The final analog voltage is digitized via a SAR ADC—providing robust analog accumulation, high linearity, and the potential for substantially higher precision (up to 10–12 bits with advanced ADCs), but at the cost of increased ADC circuit area and energy. Charge-based ACIM currently dominates state-of-the-art precision-per-energy tradeoffs for DNN workloads.
2. Performance Metrics, Advantages, and Limitations
ACIM implementations are benchmarked by TOPS/W (trillion operations per second per Watt), area efficiency (GOPS/mm), computational precision, and compute SNR (CSNR):
- Efficiency and Throughput: Current, time, and charge-based ACIMs sustain 50–400 TOPS/W and 500 GOPS/mm in leading-edge SRAM macros, exceeding DCIM (20–90 TOPS/W, 100–200 GOPS/mm) at medium-precision bit-depths.
- Analog Precision: DCIM achieves 10b determinism, while ACIM’s practical precision ranges 3–8b (current/time) to 6–10b (charge-based with advanced SAR ADCs). Transformers require CSNR 30 dB; CNNs can tolerate 15–30 dB (Yoshioka et al., 2024). Analog noise floors, PVT, IR-drop, and device mismatch limit achievable resolution.
- Energy and Area: Charge-based ACIM delivers 5–20 fJ/MAC (energy dominated by SAR ADC), current/time-based ACIM reach lower fJ/MAC in lower-precision designs. Area per cell varies (6T+cap for charge-based, 7T for current-based, delay circuits for time-based).
Limitations: Analog accumulation is susceptible to non-idealities—ADC quantization noise, DAC nonlinearity, device mismatch, thermal drift, retention degradation, and IR-drop. These introduce stochastic and systematic computation errors, constraining precision and requiring error-robust training and architectural compensation techniques (Yoshioka et al., 2024, Feng et al., 16 Aug 2025).
3. Detailed Circuit Mechanisms and Mathematical Models
Current-Based ACIM:
- Compute:
- Accumulation:
- Readout: Sense amplifier/ADC
Time-Based ACIM:
- Compute:
- Accumulation:
- Readout: Time-to-digital converter
Charge-Based ACIM:
- Compute/accumulate:
- Readout: SAR ADC digitizes , precision set by ADC ENOB. PVT-robustness stems from MOM cap variation .
Precision Metric: Compute-SNR (CSNR): with (ideal digital MAC), (analog result).
4. Hybrid Analog-Digital CIM Architectures
Hybrid CIM architectures partition the MAC bits so that MSBs are computed digitally (high precision, deterministic), and LSBs analogly (efficient, low-precision), realizing fine-grained energy/precision tradeoffs and extending achievable precision beyond the analog noise floor (Yoshioka et al., 2024):
- MSB/LSB Split: Several split architectures implement -bit MAC with upper bits in DCIM and lower bits in ACIM, achieving up to 2 energy reduction without accuracy loss (e.g., 4b ADC for ACIM LSBs).
- Saliency-Aware Boundaries: Adaptive schemes dynamically shift the digital/analog boundary (per MAC) according to data saliency (importance), e.g., On-the-Fly Saliency Evaluator (OSE) for DNN vision tasks (yielding 20–30% power savings at 0.5% accuracy degradation) [(Yoshioka et al., 2024); see also (Chen et al., 2023)].
- Rationale: Most DNNs can tolerate variable precision, reserving high-CSNR digital compute for critical front-end/high-saliency features while deploying analog compute for error-tolerant portions of the network.
Charge-based ACIM can be integrated tightly for LSB computation. Hybrid designs achieve state-of-the-art energy efficiency (300 TOPS/W) while realizing virtually full DNN inference accuracy.
5. Design Trade-offs: Noise, Precision, and ADCs
Analog noise and quantization set practical limits for ACIM. Charge-based designs, although highly linear, require high-resolution SAR ADCs whose area and energy scale exponentially with ENOB. For DNN/CNN, medium precision (6–8b ADC) yields optimal energy-accuracy balance; Transformers and other precision-sensitive models may require higher ENOB or hybrid architectures for MSBs.
Extensive evaluations highlight that, in ACIM, energy per MAC is minimized at 3–8 bits, aligning with the “quantized DNN” workload domain (Yoshioka et al., 2024). Increasing ENOB beyond 8b rapidly increases ADC area and energy, often overshooting DNN precision requirements. The leading ACIM macros now demonstrate:
- Energy efficiency: 50–400 TOPS/W (macro level)
- Precision: 3–12 bits, with 6–8 bits optimal for most vision DNNs; up to 10–12 bits with advanced ADCs in 3 nm
- Throughput: 500 GOPS/mm in charge-based bit-parallel implementations
- Robustness: CSNR 15–30 dB (CNN), 30 dB (Transformer). PVT-hardened metal capacitors (charge-based) yield cap variation 1\%
Papers report that DNN accuracy degrades rapidly when CSNR or analog noise exceeds these thresholds, especially for high-complexity networks (e.g., ViTs) (Yoshioka et al., 2024).
6. Robustness, Noise-Aware Training, and Future Directions
Robustness to analog non-idealities is critical for ACIM circuit deployment. As analog noise and device variation are unavoidable, algorithm-hardware co-design strategies are essential:
- Noise-Aware Training: Techniques such as gradient straight-through estimation (STE) decouple complex analog noise simulation during the forward pass from tractable surrogate gradients during backpropagation, effectively training models robust to non-differentiable, non-Gaussian hardware noise (Feng et al., 16 Aug 2025). This yields substantial accuracy resilience and up to 2.2 speedup and 37.9% less memory in training over full-gradient noise simulation.
- Variation-Aware Methods: Systematically injecting process/mismatch distributions into training and defining precision/CSNR-aware boundaries ensures BNNs and quantized DNNs retain accuracy under PVT and mismatch (Le et al., 2021).
- Hybrid/Adaptive Precision: Runtime-dynamically shifting between analog and digital computation according to model demand, saliency, and noise budget, supports high energy efficiency without compromising application accuracy (Yoshioka et al., 2024).
- PVT and EDA Automation: Designs increasingly target PVT-insensitivity through pure charge-domain architectures and are now amenable to end-to-end automated design and DSE (Chen et al., 2024, Zhang et al., 2024).
7. Outlook and Significance
Analog CIM (ACIM) has emerged as a dominant architecture for area- and energy-efficient DNN acceleration, especially suitable for quantized and medium-precision (3–8b) models. Cutting-edge charge-based ACIMs are already deploying in vision and language accelerators, delivering 50–400 TOPS/W, 500 TOPS/mm, and robust operation across PVT corners. Progress in hybrid ACIM-DCIM, adaptive/noise-aware training, and PVT-hardened analog design continues to push practical accuracy, robustness, and system scaling.
While the trade-off space between precision, energy, and area remains, ACIM's ability to deliver dramatic efficiency gains in the DNN "sweet spot" ensures its relevance for edge AI, IoT, and data-center accelerators targeting quantized models (Yoshioka et al., 2024).
Major Reference:
A Review of SRAM-based Compute-in-Memory Circuits (Yoshioka et al., 2024).