Stacked Intelligent Metasurface (SIM)
- SIM is a multilayer programmable electromagnetic structure that performs analog computations on propagating waves via cascaded, subwavelength-controlled metasurfaces.
- It enables real-time beamforming, spatial transforms, and wave-based computing for applications such as multi-modal communications and integrated sensing.
- Advanced wave propagation models and gradient-based optimization techniques are used to achieve high energy efficiency and significant performance gains over single-layer designs.
A stacked intelligent metasurface (SIM) is a multilayer, programmable electromagnetic structure that performs linear—or more generally, analog computational—operations on propagating electromagnetic waves. By engineering the transmission (and possibly reflection) properties of each constituent metasurface layer at subwavelength resolution, SIMs enable ultra-fast, energy-efficient, and highly reconfigurable manipulation of EM fields for advanced tasks in communications, sensing, and computation. SIMs generalize the concept of single-layer reconfigurable intelligent surfaces (RIS), surpassing their functional limitations by leveraging cascaded free-space propagation and programmable meta-atom responses across multiple layers. The result is a device capable of realizing complex transfer matrices, including beamforming, spatial transforms, and direct analog information processing, at the physical layer and in real time.
1. Multilayer Physical Architecture and Wave-Domain Computing
A canonical SIM is realized as a serial stack of programmable metasurface layers, each consisting of an -element subwavelength array of meta-atoms aligned in the transverse (–) plane. Each meta-atom provides an electronically tunable transmission coefficient , with amplitude and phase , typically adjusted via varactor diodes or MEMS (Huang et al., 14 Jun 2025). The inter-layer separation is on the order of a fraction of the carrier wavelength, typically for a total stack thickness ( at 28 GHz is common).
When an input electromagnetic field illuminates the structure, it is sequentially modulated (via each layer’s diagonal transmission matrix ) and diffracted (via free-space Green’s function matrices ) (Huang et al., 14 Jun 2025, An et al., 2023). The field vector at the -th layer evolves as
with the overall transfer operator for the -layer SIM given by
The output field at a receiver point is , where encodes the feed coupling and models the channel from output layer to receiver . The full architecture thus realizes a large-dimensional, trainable linear mapping—functionally analogous to a diffractive neural network, but implemented intrinsically by EM physics at the speed of light (An et al., 22 Jan 2026).
2. Mathematical Modeling and Optimization of SIMs
The design and configuration of SIMs are governed by a combination of analytical wave propagation models and gradient-based optimization techniques. Propagation between meta-atoms across layers is rigorously modeled by the Rayleigh–Sommerfeld diffraction kernel: where is the Euclidean distance between atoms and is the atom area. Free-space propagation and in-layer modulation are thus cascaded.
The SIM is programmed for a target functionality—such as generating a spatial energy distribution at a receiver array—by minimizing a loss function, e.g.,
where is the desired (e.g., binary edge map) pattern, and is a normalization factor. Gradients of with respect to each meta-atom’s parameters are computed via backpropagation through the linear chain of and . Training employs projected (mini-batch) gradient descent, with the feasible set projected onto , after each step (Huang et al., 14 Jun 2025, Huang et al., 2024).
For multiuser beamforming, the SIM’s phase profiles are optimized to synthesize user-orthogonal beams, with alternating optimization over power allocation and phase settings yielding locally optimal sum-rate performance: where is the SINR at user (An et al., 2023, An et al., 2023). Both analytic and deep reinforcement learning–based (e.g., DDPG actor-critic) strategies have been successfully employed for highly nonconvex settings (Liu et al., 2024).
3. Communication, Sensing, and In-Wave Computing Applications
SIMs have been adopted for a variety of EM-domain tasks:
- Multi-Modal Semantic Communications: SIMs enable direct wave-domain imaging of visual semantic maps (such as edge patterns) while simultaneously transmitting textual semantic metadata via amplitude-phase modulations. A generative-adversarial model at the receiver fuses the SIM-imaged pattern and the textual description for scene reconstruction, yielding high-fidelity output with drastically reduced bandwidth compared to bitstream-based schemes (Huang et al., 14 Jun 2025).
- Multiuser Beamforming and Holographic MIMO: SIMs perform multiuser MISO/MIMO beamforming and HMIMO channel diagonalization entirely in the analog domain, eliminating digital baseband computation and dramatically reducing the RF hardware count. System-level evaluations show sum-rate improvements of up to $2$– over conventional hybrid schemes with similar hardware budgets (An et al., 2023, An et al., 2023, An et al., 2023).
- Integrated Sensing and Communications (ISAC): Joint optimization realizes both communication (multiuser downlink) and radar (e.g., beampattern gain in a specified direction) tasks, using penalties to enforce sensing constraints while maximizing SE (Niu et al., 2024, Ranasinghe et al., 29 Apr 2025). Design trade-offs between beamforming DoF, computational complexity, and joint objective regularization are observed.
- Wave-Based Computing (e.g., 2D DFT for DOA Estimation): SIMs physically implement spatial transforms such as the 2D DFT for direction-of-arrival estimation. Programmable inter-layer phases are adjusted to fit the SIM’s end-to-end transfer function to the DFT matrix, enabling real-time, optical-speed spatial spectrum computation at sub-dB mean-square error (An et al., 2023, An et al., 2024).
- Task-Oriented Semantic Communications: An electromagnetic neural network (EMNN) realized via SIM performs source and semantic encoding jointly—all by diffractive propagation—enabling direct physical-layer image recognition with test accuracy while omitting digital compression and baseband inference (Huang et al., 2024).
4. Performance Analysis and Practical Considerations
Quantitative Metrics
- Pattern fidelity (MSE, SSIM): As SIM layer count increases (e.g., to $10$), pattern generation error drops rapidly (MSE $0.15$ to $0.02$) (Huang et al., 14 Jun 2025).
- Sum-rate/channel capacity: SIMs routinely achieve $30$– higher sum-rate than single-layer metasurface or digital-only precoders with matched hardware, and converge to within $1$ dB of fully-digital massive MIMO for (An et al., 2023, An et al., 2023, An et al., 2023).
- Convergence: Custom gradient or alternating optimization algorithms converge in $10$–$50$ iterations (phases, powers), with joint DRL-based methods stabilizing reward within steps (Liu et al., 2024, Liu et al., 2024).
- Hardware reduction: SIM-based transceivers use only low-resolution RF chains and DACs for users, versus in conventional architectures (An et al., 2023).
System and Implementation Insights
- Aperture and layer design: There is a trade-off between the number of meta-atoms and layers ( and ) and achievable DoF, with saturation observed due to hardware and mutual coupling limits. For robust FDD or OFDM wideband communication, , are typical peak values (Li et al., 1 Mar 2025).
- Calibration and modeling: Accurate electromagnetic models (including inter-atom coupling and back-reflection) are essential for large-aperture, high-fidelity computing; multi-port network approaches outperform cascade approximations in the presence of strong coupling (Abrardo et al., 5 Jan 2025).
- Robustness: SIMs have inherent resilience to moderate channel estimation errors and quantized phase control when properly normalized and trained (Huang et al., 14 Jun 2025, An et al., 2023).
- Energy and hardware efficiency: Wave-domain analog processing at light speed eliminates digital latency (nanosecond-scale), lowers total power and thermal footprint, and allows for highly scalable architectures (Renzo, 2024, An et al., 22 Jan 2026). Hybrid active/passive partitioning further boosts gain (Iudice et al., 28 Jan 2026).
5. Comparison with Single-Layer Metasurfaces and Other Analog Devices
SIMs extend RIS and metasurface lens principles in both spatial depth (number of programmable interfaces) and computational capability. Whereas a single-layer metasurface can implement only a fixed phase-amplitude mask, an -layer SIM enables cascaded neural-network-like analog processing. For equal surface area, an -layer SIM outperforms a one-layer device by up to $200$– in communication and sensing benchmarks (Li et al., 1 Mar 2025).
In contrast to multi-layer dielectric or fixed-phase lenses, all layers in an SIM are reconfigurable, offering dynamic and context-aware adaptation for evolving wireless tasks (Renzo, 2024). Modern implementations support active (amplitude-controlled) and passive (phase-only) partitioning for site-adapted gain-vs-noise optimization (Darsena et al., 27 Oct 2025, Iudice et al., 28 Jan 2026).
6. Key Challenges and Research Directions
- Electromagnetic modeling: Scaling to ultra-large apertures ( meta-atoms) requires precise calibration and advanced modeling of mutual coupling and nonparaxial propagation (Abrardo et al., 5 Jan 2025).
- Control and integration: Managing per-atom control signals (via FPGA/ASIC), power supply, and thermal dissipation for large is nontrivial.
- Fabrication tolerances: Sub-mm layer alignment and phase precision must be maintained for high-resolution tasks; on-line calibration and in situ optimization are essential (An et al., 22 Jan 2026).
- Learning and optimization: Data-driven, hardware-in-the-loop, and AI-native design approaches (as realized in NVIDIA Sionna/TensorFlow) are now available for end-to-end, differentiable training, enabling practical system deployment in complex, time-varying environments (Iudice et al., 28 Jan 2026).
- Future applications: Emerging directions include joint wave-based communications and radar (ISAC), semantic-aware physical-layer designs, direct-wavespace AI, and the integration of nonlinear and active meta-atoms for universal analog computation (Renzo, 2024, An et al., 22 Jan 2026).
7. Summary Table: Representative SIM Application Domains
| Application | Modeled/Emulated Function | Key Performance Gains |
|---|---|---|
| Multi-modal SemCom | Edge imaging + text fusion | Bandwidth savings, SSIM↑ |
| MIMO/HMIMO | Analog channel diagonalization | 2–3× sum-rate, hardware↓ |
| ISAC | Communication + beampattern | SE↑, sensing MSE↓ |
| DOA Estimation | Physical 2D DFT engine | MSE , no RF chains |
| Task-oriented SemCom | Direct analog image recognition | 90% accuracy, latency↓ |
These summarized outcomes highlight the capacity of SIMs to unify communications, analog computing, and sensing with high speed, energy efficiency, and functional versatility. Systematic advances in EM modeling, control integration, and optimization will be crucial for large-scale deployment in 6G and beyond (Huang et al., 14 Jun 2025, An et al., 2023, An et al., 2023, Niu et al., 2024, Huang et al., 2024, Renzo, 2024, Abrardo et al., 5 Jan 2025).