Papers
Topics
Authors
Recent
Search
2000 character limit reached

Self-Modulated Convolution (SMC)

Updated 30 January 2026
  • Self-Modulated Convolution (SMC) is a dynamic mechanism that adaptively alters convolution parameters using local spectral context, enabling robust feature processing.
  • It fuses spatial and spectral information through a spectral self-modulation module, generating locally adaptive scaling and shifting coefficients.
  • The SM-CNN architecture, incorporating SSMRBs and self-modulation, outperforms baselines with higher MPSNR and MSSIM and lower SAM in challenging denoising scenarios.

Self-Modulated Convolution (SMC) refers to a dynamic, input-adaptive convolutional mechanism in which the parameters governing channel-wise scaling and shifting of feature activations are generated on-the-fly from auxiliary, data-driven sources. In the context of hyperspectral image (HSI) denoising, SMC enables convolutional neural networks to modulate features based on local spectral context, facilitating robust denoising under complex, heterogeneous noise by integrating both spatial and spectral information. The spectral self-modulating convolution framework has been operationalized in architectures such as the Self-Modulating Convolutional Neural Network (SM-CNN), which employs Spectral Self-Modulating Residual Blocks (SSMRBs) as its principal adaptive units (Torun et al., 2023).

1. Mathematical Formulation of Self-Modulated Convolution

A standard 2D convolution on an intermediate feature map FRh×w×CF \in \mathbb{R}^{h \times w \times C} is given by: F~i,jc=u=k/2k/2v=k/2k/2Wu,vc  Fi+u,j+vc+bc\widetilde F^c_{i,j} = \sum_{u=-\lfloor k/2\rfloor}^{\lfloor k/2\rfloor} \sum_{v=-\lfloor k/2\rfloor}^{\lfloor k/2\rfloor} W^c_{u,v}\;F^c_{i+u,\,j+v} + b^c where Wu,vcW^c_{u,v} and bcb^c denote static kernel weights and biases.

In SMC, an additional feature-wise affine normalization step modulates each activation adaptively using parameters derived from the adjacent-band spectral cube YλRh×w×KY^\lambda \in \mathbb{R}^{h \times w \times K}: Fout,i,jc=γc ⁣(Yi,jλ)F~i,jcμcσc+βc ⁣(Yi,jλ)F^{c}_{\mathrm{out},\,i,j} = \gamma_c\!\bigl(Y^\lambda_{i,j}\bigr)\, \frac{\widetilde F^c_{i,j}-\mu_c}{\sigma_c} +\, \beta_c\!\bigl(Y^\lambda_{i,j}\bigr) where

μc=1hwi,jF~i,jc,σc=1hwi,j(F~i,jcμc)2+δ\mu_c = \frac{1}{h\,w} \sum_{i,j} \widetilde F^c_{i,j}, \quad \sigma_c = \sqrt{ \frac{1}{h\,w} \sum_{i,j}(\widetilde F^c_{i,j} - \mu_c)^2 + \delta }

with δ=105\delta = 10^{-5}. The modulation coefficients are

mi,jc=γc(Yi,jλ),bi,jc=βc(Yi,jλ)m_{i,j}^c = \gamma_c(Y^\lambda_{i,j}), \qquad b_{i,j}^c = \beta_c(Y^\lambda_{i,j})

and the operation can be folded into a locally-adaptive convolution form: Fout,i,jc=u,v(mi,jcWu,vc)Fi+u,j+vc+bi,jcF^c_{\mathrm{out},\,i,j} = \sum_{u,v} \left(m^c_{i,j} \odot W^c_{u,v}\right) F^c_{i+u,\,j+v} + b^c_{i,j} Here, both the modulation and shift are spatially and spectrally adaptive, controlled by the local spectral context.

2. Spectral Self-Modulating Residual Block (SSMRB)

Each SSMRB is composed of two parallel computational streams:

  • A conventional 2D-convolutional residual pathway.
  • A Spectral Self-Modulation Module (SSMM).

The block follows the sequence: Input[Conv2DReLU]SSMM[Conv2DReLU]SSMMadd(input)\text{Input} \to [\text{Conv2D} \to \text{ReLU}] \to \text{SSMM} \to [\text{Conv2D} \to \text{ReLU}] \to \text{SSMM} \to \text{add(input)}

Within each SSMM, the neighbour-band cube YλY^\lambda is first processed by a small 2-layer CNN: Φ1(Yλ)=ReLU(Conv5×5(Yλ)), Φ2(Yλ)=Conv1×1(Φ1(Yλ))\Phi_1(Y^\lambda) = \mathrm{ReLU}(\mathrm{Conv}_{5\times5}(Y^\lambda)),\ \Phi_2(Y^\lambda) = \mathrm{Conv}_{1\times1}(\Phi_1(Y^\lambda)) The output Φ2(Yλ)\Phi_2(Y^\lambda) is then split into two branches to yield γ(Yλ)Rh×w×C\gamma(Y^\lambda) \in \mathbb{R}^{h\times w\times C} and β(Yλ)Rh×w×C\beta(Y^\lambda) \in \mathbb{R}^{h\times w\times C}. These coefficients affinely modulate the normalized features channel-wise as described in Section 1.

3. Fusion of Spatial and Spectral Information

SMC fuses spatial and spectral features by combining standard 2D convolutions over the target (noisy) band with spectral modulation conditioned on the neighbour-band cube YλY^\lambda. At each deep layer, the feature map is normalized per channel: F^i,jc=(Fpre,i,jcμc)/σc\hat F^c_{i,j} = (F^c_{\mathrm{pre},\,i,j} - \mu_c)/\sigma_c Concurrently, the spectral side-network provides spatially-varying γc(Yi,jλ)\gamma_c(Y^\lambda_{i,j}) and βc(Yi,jλ)\beta_c(Y^\lambda_{i,j}), and the features are fused: Fnext,i,jc=γc(Yi,jλ)F^i,jc+βc(Yi,jλ)F^c_{\mathrm{next},\,i,j} = \gamma_c(Y^\lambda_{i,j}) \hat F^c_{i,j} + \beta_c(Y^\lambda_{i,j}) A plausible implication is that SMC realizes dynamic feature processing, in which feature statistics are adaptively aligned to local spectral context, providing better handling of instance-specific noise patterns.

4. Architecture and Training Protocol of SM-CNN

The SM-CNN architecture for HSI denoising incorporates SMC and SSMRBs in both the stem and backbone:

  • Input: single-band patch yisRh×w×1\mathbf{y}^s_{i} \in \mathbb{R}^{h\times w\times 1} and neighbour cube yiλRh×w×K\mathbf{y}^\lambda_{i} \in \mathbb{R}^{h\times w\times K}.
  • Stem: three parallel 2D convolutions (3×33\times3, 5×55\times5, 7×77\times7) on ys\mathbf{y}^s and three parallel 3D convolutions on yλ\mathbf{y}^\lambda; all are ReLU-activated and concatenated.
  • Backbone: a sequence of Conv2D+ReLU layers interleaved with two SSMRBs.
  • Skip Connections: four lateral skip links from intermediate layers to the final feature merging stage.
  • Output: $1$-channel 3×33\times3 Conv2D produces a residual image; the final clean estimate is the sum of the input and predicted residual.

Training Details:

  • Loss: mean-absolute error (L1L_1) between output and clean patch:

L(θ)=12NiDθ(yis,yiλ)xis1\mathcal{L}(\theta) = \frac1{2N} \sum_i \| D_\theta(\mathbf{y}^s_i, \mathbf{y}^\lambda_i) - \mathbf{x}^s_i \|_1

  • Optimizer: Adam, initial learning rate 10410^{-4}, no decay schedule.
  • Batch size: $128$; epochs: $100$ (using best validation checkpoint).
  • Data augmentation: random flips, 9090^\circ–rotations, random spectral scans with K/2K/2 band mirroring at cube edges.

5. Empirical Performance and Ablation Results

SM-CNN with SMC achieves state-of-the-art performance on HSI denoising benchmarks. On five challenging synthetic-noise scenarios (GN, GN+SN, GN+DN, GN+IN, mixture) over Washington DC data, SM-CNN attains the highest mean peak signal-to-noise ratio (MPSNR) and mean structural similarity index (MSSIM) in 4 out of 5 cases, and the lowest spectral angle mapper (SAM) in all cases. For example, in the mixture-noise scenario:

  • MPSNR=29.83\mathrm{MPSNR} = 29.83
  • MSSIM=0.973\mathrm{MSSIM} = 0.973
  • SAM=0.066\mathrm{SAM} = 0.066

On cross-dataset transfer to Pavia University, a single pretrained SM-CNN achieves MPSNR=31.36\mathrm{MPSNR} = 31.36, MSSIM=0.923\mathrm{MSSIM} = 0.923, SAM=0.124\mathrm{SAM} = 0.124, outperforming classical and contemporary deep learning approaches.

In real-noise experiments for Indian Pines scene classification, SM-CNN denoising raises overall accuracy from 75.8%75.8\% to 89.3%89.3\% (kappa $0.878$), the best among all single-model denoisers.

Ablation on mixture-noise shows:

  • Without self-modulation (WM-CNN), MPSNR drops to $27.46$.
  • Using only a single SSMM layer (“Lite”) yields MPSNR $28.57$.
  • The full SM-CNN (two SSMMs) reaches MPSNR $29.83$, demonstrating maximal benefit from self-modulation.
Model Variant MPSNR MSSIM SAM
WM-CNN (no SMC) 27.46
Lite (1 SSMM) 28.57
SM-CNN (2 SSMMs) 29.83 0.973 0.066

Note: Dashes indicate corresponding values are not explicitly provided in the data for all columns.

6. Interpretation and Context

Self-modulated convolution realizes a form of conditional computation, where the modulation parameters for each spatial location and feature channel are dynamically inferred from adjoining spectral context. In the HSI denoising setting, this enables the preservation of fine spatial details and the adaptive suppression of heterogeneous, spectrally variant noise. SM-CNN's architecture and training paradigm are distinguished by the explicit incorporation of spectral self-modulation within residual blocks, yielding both quantitative improvements and qualitative robustness compared to static convolutional baselines. The approach demonstrates particular utility in domains where local data statistics are highly variable and contextually dependent (Torun et al., 2023).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Self-Modulated Convolutions (SMC).