Self-Modulated Convolution (SMC)
- Self-Modulated Convolution (SMC) is a dynamic mechanism that adaptively alters convolution parameters using local spectral context, enabling robust feature processing.
- It fuses spatial and spectral information through a spectral self-modulation module, generating locally adaptive scaling and shifting coefficients.
- The SM-CNN architecture, incorporating SSMRBs and self-modulation, outperforms baselines with higher MPSNR and MSSIM and lower SAM in challenging denoising scenarios.
Self-Modulated Convolution (SMC) refers to a dynamic, input-adaptive convolutional mechanism in which the parameters governing channel-wise scaling and shifting of feature activations are generated on-the-fly from auxiliary, data-driven sources. In the context of hyperspectral image (HSI) denoising, SMC enables convolutional neural networks to modulate features based on local spectral context, facilitating robust denoising under complex, heterogeneous noise by integrating both spatial and spectral information. The spectral self-modulating convolution framework has been operationalized in architectures such as the Self-Modulating Convolutional Neural Network (SM-CNN), which employs Spectral Self-Modulating Residual Blocks (SSMRBs) as its principal adaptive units (Torun et al., 2023).
1. Mathematical Formulation of Self-Modulated Convolution
A standard 2D convolution on an intermediate feature map is given by: where and denote static kernel weights and biases.
In SMC, an additional feature-wise affine normalization step modulates each activation adaptively using parameters derived from the adjacent-band spectral cube : where
with . The modulation coefficients are
and the operation can be folded into a locally-adaptive convolution form: Here, both the modulation and shift are spatially and spectrally adaptive, controlled by the local spectral context.
2. Spectral Self-Modulating Residual Block (SSMRB)
Each SSMRB is composed of two parallel computational streams:
- A conventional 2D-convolutional residual pathway.
- A Spectral Self-Modulation Module (SSMM).
The block follows the sequence:
Within each SSMM, the neighbour-band cube is first processed by a small 2-layer CNN: The output is then split into two branches to yield and . These coefficients affinely modulate the normalized features channel-wise as described in Section 1.
3. Fusion of Spatial and Spectral Information
SMC fuses spatial and spectral features by combining standard 2D convolutions over the target (noisy) band with spectral modulation conditioned on the neighbour-band cube . At each deep layer, the feature map is normalized per channel: Concurrently, the spectral side-network provides spatially-varying and , and the features are fused: A plausible implication is that SMC realizes dynamic feature processing, in which feature statistics are adaptively aligned to local spectral context, providing better handling of instance-specific noise patterns.
4. Architecture and Training Protocol of SM-CNN
The SM-CNN architecture for HSI denoising incorporates SMC and SSMRBs in both the stem and backbone:
- Input: single-band patch and neighbour cube .
- Stem: three parallel 2D convolutions (, , ) on and three parallel 3D convolutions on ; all are ReLU-activated and concatenated.
- Backbone: a sequence of Conv2D+ReLU layers interleaved with two SSMRBs.
- Skip Connections: four lateral skip links from intermediate layers to the final feature merging stage.
- Output: $1$-channel Conv2D produces a residual image; the final clean estimate is the sum of the input and predicted residual.
Training Details:
- Loss: mean-absolute error () between output and clean patch:
- Optimizer: Adam, initial learning rate , no decay schedule.
- Batch size: $128$; epochs: $100$ (using best validation checkpoint).
- Data augmentation: random flips, –rotations, random spectral scans with band mirroring at cube edges.
5. Empirical Performance and Ablation Results
SM-CNN with SMC achieves state-of-the-art performance on HSI denoising benchmarks. On five challenging synthetic-noise scenarios (GN, GN+SN, GN+DN, GN+IN, mixture) over Washington DC data, SM-CNN attains the highest mean peak signal-to-noise ratio (MPSNR) and mean structural similarity index (MSSIM) in 4 out of 5 cases, and the lowest spectral angle mapper (SAM) in all cases. For example, in the mixture-noise scenario:
On cross-dataset transfer to Pavia University, a single pretrained SM-CNN achieves , , , outperforming classical and contemporary deep learning approaches.
In real-noise experiments for Indian Pines scene classification, SM-CNN denoising raises overall accuracy from to (kappa $0.878$), the best among all single-model denoisers.
Ablation on mixture-noise shows:
- Without self-modulation (WM-CNN), MPSNR drops to $27.46$.
- Using only a single SSMM layer (“Lite”) yields MPSNR $28.57$.
- The full SM-CNN (two SSMMs) reaches MPSNR $29.83$, demonstrating maximal benefit from self-modulation.
| Model Variant | MPSNR | MSSIM | SAM |
|---|---|---|---|
| WM-CNN (no SMC) | 27.46 | — | — |
| Lite (1 SSMM) | 28.57 | — | — |
| SM-CNN (2 SSMMs) | 29.83 | 0.973 | 0.066 |
Note: Dashes indicate corresponding values are not explicitly provided in the data for all columns.
6. Interpretation and Context
Self-modulated convolution realizes a form of conditional computation, where the modulation parameters for each spatial location and feature channel are dynamically inferred from adjoining spectral context. In the HSI denoising setting, this enables the preservation of fine spatial details and the adaptive suppression of heterogeneous, spectrally variant noise. SM-CNN's architecture and training paradigm are distinguished by the explicit incorporation of spectral self-modulation within residual blocks, yielding both quantitative improvements and qualitative robustness compared to static convolutional baselines. The approach demonstrates particular utility in domains where local data statistics are highly variable and contextually dependent (Torun et al., 2023).