Processing-in-Pixel CMOS Circuits
- Processing-in-Pixel CMOS Circuits are integrated sensors that perform on-pixel computing tasks like amplification, thresholding, and in-situ convolution.
- They merge analog and mixed-signal processing with photodetection to achieve low noise (e.g., <15e⁻ ENC), reduced energy consumption, and high throughput.
- Advanced designs leverage in-pixel MAC units and reconfigurable architectures for real-time feature extraction in imaging, AI, and neuromorphic vision.
Processing-in-Pixel (PIP) CMOS Circuits are integrated circuit architectures wherein significant data processing—for example, amplification, feature extraction, thresholding, or neural computations—occurs within each individual pixel or immediately adjacent circuitry, as opposed to being performed at the periphery of the sensor array or in an external processor. By tightly coupling analog or mixed-signal computational functions with physical sensing, PIP circuits enable substantial reductions in bandwidth, energy consumption, data movement, and system latency, and can provide a platform for in-situ real-time feature extraction for applications ranging from low-noise charge sensing to embedded deep learning.
1. Architectural Principles and PIP Taxonomy
Contemporary PIP CMOS circuits exhibit architectural heterogeneity depending on application domain and signal modality. Foundational motifs include:
- Analog signal chain integration: Direct charge or photo-current collection is immediately followed by a charge-sensitive preamplifier (CSA), local analog thresholding and discrimination, and in some designs, analog readout multiplexing. The "Topmetal-II−" array is exemplary: each 83 μm pitch pixel integrates an exposed metal collector (C_det ≈ 23 fF), a folded-cascode CSA (Cf ≈ 5 fF), a two-stage source-follower chain, and a local dynamic comparator with 4-bit threshold DAC; the split chain supports analog multiplicative scanning and parallel digital hit logic (An et al., 2015).
- In-pixel multiply–accumulate (MAC): In vision and neuromorphic sensors, PIP MAC units combine local analog photodiode outputs with embedded weights, supporting convolutions, filtering, or event-based operations. Circuits span simple current-multiplier topologies (weight transistor plus accumulator capacitor (Kaiser et al., 2023, Kaiser et al., 2023)), time-encoded PWM schemes with current-modulated switched capacitors (Zhang et al., 2022), and crossbar-backed architectures with NVM-based weight stores (Yin et al., 2024). Event-driven MACs are prevalent in DVS/dynamic sensors.
- Digital/mixed-signal integration: Many platforms combine analog preprocessing with flash or SAR-ADC digitization and local embedded logic. For instance, "Smart Pixels" in 28 nm CMOS incorporate per-pixel CSAs, 2-bit flash ADCs, and cluster-level combinational neural network logic in a compact, analog-noise-robust fashion (Parpillon et al., 2024).
- Programmable and reconfigurable pipelines: Advanced PIP arrays now leverage 3D stacking and on- or off-pixel NVM for kernel programmability, as in the FPCA architecture, which enables runtime reconfiguration of kernel size, channel multiplicity, or stride without pitching overhead (Yin et al., 2024), and P2M-style split die for convolution in memory (Datta et al., 2022).
2. Core Circuit Topologies and Signal Flows
PIP circuits are implementation-driven, with the following principles emerging:
- Charge and current domain processing: Input charge (either from direct ionization or photogenerated) is integrated onto high-fidelity node capacitances, such as the Topmetal node (C_det ≈ 23 fF) or CMOS photodiode capacitances (<10 fF). The CSA output yields a voltage proportional to net charge (V_out(Q_in) ≈ –Q_in/C_f) with integration time set by Rf·Cf. These signals are held (τ_f ≫ scan/readout time), enabling quasi-DC operation and low noise (An et al., 2015).
- Weight embedding: Analog weights—crucial for MAC operations—are physically encoded as the width/length ratios of per-pixel transistors, temporal PWM duty cycles, or as resistance states of NVM cells in stacked die. For example, in the P2M paradigm, each pixel instantiates C_out weight transistors, selectively enabled based on kernel, with their aggregate conductance programming the kernel dot-product (Datta et al., 2022). NVM schemes (e.g., RRAM/PCM, as in FPCA) permit field-programmable weights without increasing per-pixel footprint (Yin et al., 2024).
- Thresholding and discrimination: Comparators—commonly with 4- to 5-bit in-pixel DACs—permit tunable, per-pixel threshold adjustment to accommodate process variation. Topmetal-II−, for instance, programs each pixel's DAC code to align with a global threshold, resulting in <15 e⁻ noise-aligned occupancy (An et al., 2015). Flash ADCs (e.g., 2-bit per pixel in smart pixels (Parpillon et al., 2024)) or local piecewise transfer implementations using phase-transition devices (Udoy et al., 2024) are also observed.
- Analog and digital readout: Signal flows include time-shared, multiplexed analog readouts (row-major scan modules), priority-encoded sparse digital logic (asynchronous column token chains in Topmetal-II−), and region-sparse digitization (selective ADC on salient patches in transformer-optimised PIP (Zhang et al., 2022)). Integration with periphery single-slope or SAR ADCs is common, but true in-pixel analog–digital conversion is increasingly viable in larger pixels (e.g., Skipper-in-CMOS and SPROCKET2 (Quinn et al., 2024, Lapi et al., 2024)).
3. Performance Metrics, Trade-offs, and Noise Analysis
Quantitative analysis of PIP circuits reveals fundamental architectural trade-offs:
- Noise, linearity, and speed: For analog front-ends, the equivalent input-referred noise charge (ENC) is typically modeled as
with for series (thermal), for $1/f$, and for parallel (feedback/leakage) noise sources (An et al., 2015). Topmetal-II− attains <15e⁻ ENC (measured 13.9e⁻), with ∼5fF, low bias current (A), and sub-MHz bandwidth to trade off speed for retention. Neuromorphic PIP MACs in 22 nm feature sub-femtojoule event energy with retention limited by leakage (e.g., config-c achieving <1% error over T_INTG=10 ms at <20 fF C_K) (Kaiser et al., 2023).
- Area and fill-factor: PIP circuits must balance computational density with photodiode area. Low-transistor-count schemes (2.5T/pixel) can achieve fill-factors >40% even with in-pixel convolution (Song et al., 2021), whereas advanced PIP cells (Skipper, smart pixels) devote 30–60% of pixel area to analog and/or digital blocks (Parpillon et al., 2024, Lapi et al., 2024).
- Bandwidth and energy compression: By performing early feature extraction or convolution, many PIP schemes reduce downstream data by factors of 10–30×, with energy per operation dropping to sub-pJ/MAC (P2M, FPCA, transformer-integrated PWM PIP) (Datta et al., 2022, Yin et al., 2024, Zhang et al., 2022). Typical reported front-end power is 5–10 μW/pixel at 3.3 V for full analog/digital stacks (Topmetal-II−), and <30 mW/MP for high-parallel transformer-optimized arrays (An et al., 2015, Zhang et al., 2022).
- Processing speed: Frame rates are bounded by analog scan rates (e.g., 0.66 ms full-frame for Topmetal-II− (An et al., 2015)) or ADC throughput. Multi-sample parallel readouts (Skipper-in-CMOS, SPROCKET2) can reach 4 kfps for large areas (Quinn et al., 2024), but integration times in analog MACs (DVS, neuromorphic) are limited (≲10 ms optimal for weak-leakage circuits (Kaiser et al., 2023)).
4. Algorithm-Hardware Co-Design and Non-Idealities
Rigorous co-design strategies are essential to exploit PIP capabilities:
- Noise and nonlinearity compensation: Circuit-in-the-loop modeling is employed, embedding device-level transfer curves (e.g., ΔV = f(w×N_spike) with statistical noise) directly into the training pipeline of CNNs and SNNs to ensure algorithmic robustness against analog mismatch, process spread, and quantization effects (Kaiser et al., 2023, Kaiser et al., 2023).
- Leakage mitigation: For long analog accumulation (multi-ms), leakage currents from weight FET stacks are suppressed via stacked high-V_th switches, subthreshold biasing, and current-nulling techniques. Physical design choices (MIM capacitor size, high-Vth devices, programmable leakage current sources) extend practical integration windows by orders of magnitude (Kaiser et al., 2023).
- Programmability and reconfigurability: NVM-backed PIP architectures (FPCA) and multi-channel weight stacks permit kernel, stride, and channel reconfiguration dynamically; bucket-select curve-fits model non-ideal analog outputs for algorithm-hardware mapping (Yin et al., 2024). Physical weight encoding (transistor width) yields highest energy-efficiency but lacks field-programmability unless hybridized with FeFET/RRAM inserts (Datta et al., 2022).
- Sparse and event-driven readout: Exploiting sparsity at the sensor level (attention-based patch selection, per-column token arbitration) minimizes ADC and I/O overhead, critical for transformer-based pipelines (Zhang et al., 2022). Event- and analog-domain prioritization (asynchronous digital core in Topmetal-II−) sustains low-latency, high-throughput scenarios.
5. Application Domains and Technology Demonstrations
PIP CMOS circuits are now deployed or demonstrated for multiple domains:
| Application | Example Architectures | Noted Performance Advantages |
|---|---|---|
| Ionization/Charge Sensing | Topmetal-II− (An et al., 2015) | <15e⁻ noise; 200e⁻ digital threshold; ∼0.66ms full-frame; <10μW/pix |
| In-sensor CNN | P2M (Datta et al., 2022); FPCA (Yin et al., 2024); Patch-PIP (Zhang et al., 2022) | 10–30× bandwidth/energy reduction; sub-pJ/MAC; programmable kernels |
| Neuromorphic vision | Neuromorphic-P2M (Kaiser et al., 2023); P2M-DVS (Kaiser et al., 2023) | Asynchronous MAC, <2× backend energy; 88–95% accuracy maintained |
| In-pixel AI filtering | Smart Pixels (Parpillon et al., 2024) | 6μW/pix; in-pixel 2-bit ADC + NN logic per-256 cluster; 54–75% data rejection |
| Ultra-low-noise photon counting | Skipper-in-CMOS (Lapi et al., 2024); SPROCKET2 (Quinn et al., 2024) | Single-photon sensitivity; sub-electron noise (σ=0.15 e⁻); 4 kfps |
- Ion-imaging and TPC readouts benefit from deeply-integrated low-noise processing with tunable digital thresholding (An et al., 2015).
- Mobile and edge AI platforms harness PIP for embedded feature extraction, convolution, and transformer-ready tokenization, reducing energy and computation at the front-end (Zhang et al., 2022, Datta et al., 2022, Yin et al., 2024).
- Neuromorphic event-based vision uses leakage-robust analog MAC schemes for low-latency, high-throughput spiking network computation (Kaiser et al., 2023, Kaiser et al., 2023).
- Scientific imaging extends to Skipper-in-CMOS and SPROCKET2, integrating in-pixel multi-sample averaging and ADC for sub-electron readout at high frame rates (Lapi et al., 2024, Quinn et al., 2024).
6. Comparative Analysis and Future Directions
Processing-in-Pixel architectures are compared along these axes:
- Integration scale and cost: CMOS-compatible 3T and 4T architectures remain standard, but emergent 3D stacking and NVM methodologies (FPCA, P2M) shift computational burden to backend-of-line processes or second die, maximizing fill-factor and recomputability (Yin et al., 2024). However, increased layer count and interposer requirements challenge yield and matching.
- Functional flexibility vs. circuit simplicity: The area and power required for in-pixel ADC or neural cores preclude their use in sub-10μm pitches; analog thresholding, modest weight stacks, and analog MACs scale more efficiently but limit digital complexity.
- Noise-power-bandwidth trade-offs: Many PIP circuits maximize retention and SNR at the cost of frame rate (long τ_f, low bias), whereas others (e.g., PWM-based matrix multiplication) prioritize high throughput (An et al., 2015, Zhang et al., 2022). Structures with flexible integration depth (e.g., Smart Pixels with event-driven activation of digital logic (Parpillon et al., 2024)) approach the Pareto frontier for energy efficiency at moderate digital depth.
- Programmability and security: PIP can provide strong on-chip data security by confining raw analog data to the sensor (enhancement circuits (Udoy et al., 2024)), and programmable mappings (HyperFETs with phase transition elements) permit application-specific adaptation at runtime or during device provisioning.
Future directions include full-stack neural-network integration (multi-layer, convolutional and transformer logic), intensification of 3D or hybrid integration with high-density NVM, and adaptive analog-digital co-design for event-dense and ultra-low-noise sensing environments.
References:
- "A Low-Noise CMOS Pixel Direct Charge Sensor, Topmetal-II-" (An et al., 2015)
- "Hardware-Algorithm Co-design Enabling Processing-in-Pixel-in-Memory (P2M) for Neuromorphic Vision Sensors" (Kaiser et al., 2023)
- "Low-power In-pixel Computing with Current-modulated Switched Capacitors" (Zhang et al., 2022)
- "In-Pixel Foreground and Contrast Enhancement Circuits with Customizable Mapping" (Udoy et al., 2024)
- "FPCA: Field-Programmable Pixel Convolutional Array for Extreme-Edge Intelligence" (Yin et al., 2024)
- "P2M: A Processing-in-Pixel-in-Memory Paradigm for Resource-Constrained TinyML Applications" (Datta et al., 2022)
- "Neuromorphic-P2M: Processing-in-Pixel-in-Memory Paradigm for Neuromorphic Image Sensors" (Kaiser et al., 2023)
- "A Reconfigurable Convolution-in-Pixel CMOS Image Sensor Architecture" (Song et al., 2021)
- "Real-time Analog Pixel-to-pixel Dynamic Frame Differencing with Memristive Sensing Circuits" (Krestinskaya et al., 2018)
- "Toward Efficient Hyperspectral Image Processing inside Camera Pixels" (Datta et al., 2022)
- "A Cryogenic readout integrated circuit with analog pile-up and in-Pixel ADC for high frame rate Skipper CCD-in-CMOS Sensors" (Quinn et al., 2024)
- "Skipper-in-CMOS: Non-Destructive Readout with Sub-Electron Noise Performance for Pixel Detectors" (Lapi et al., 2024)
- "Smart Pixels: In-pixel AI for on-sensor data filtering" (Parpillon et al., 2024)