Illumination Guided Modulation Block

Updated 3 February 2026

IGM Block is a spatially adaptive modulation unit that dynamically gates feature activations using per-pixel illumination maps to improve restoration in underexposed regions.
It integrates self-attention and guided-attention mechanisms to refine features and preserve details in well-lit areas within a dual-stream architecture.
Empirical studies show that increasing block depth enhances metrics like PSNR and SSIM, highlighting the modular design’s impact on low-light super-resolution performance.

The Illumination Guided Modulation Block (IGM Block) is a spatially adaptive feature modulation unit designed to couple explicit illumination priors with data-driven attention for image enhancement under severe illumination degradations. Introduced in the context of low-light image super-resolution as part of the Guided Texture and Feature Modulation Network (GTFMN), the IGM Block dynamically gates and refines feature activations based on a dense per-pixel illumination map, achieving targeted intensification in underexposed regions and detail preservation in well-lit areas. This architecture provides a modular and mathematically precise solution for joint illumination enhancement and super-resolution, validated with state-of-the-art quantitative and qualitative performance on established benchmarks (Huang et al., 27 Jan 2026).

1. Architectural Context and Motivation

The IGM Block resides within a dual-stream framework comprising an Illumination Stream and a Texture Stream. The Illumination Stream predicts a spatially varying illumination map $\mathbf{M} \in [0,1]^{H \times W \times 1}$ from a low-light input $I_{\mathrm{LR}} \in \mathbb{R}^{H \times W \times 3}$ , leveraging a structure decoder (producing $\mathbf{M}_{\mathrm{spatial}}$ ) and a global brightness predictor ( $g$ ). The Texture Stream consists of a deep cascade of $N$ IGM Blocks, each modulating the current feature tensor $\mathbf{F}_{i-1}$ with the guidance of $\mathbf{M}$ :

$\mathbf{F}_i = \mathcal{B}_i(\mathbf{F}_{i-1}, \mathbf{M}).$

The central objective is to realize spatially adaptive restoration, especially benefitting images with highly nonuniform illumination.

2. Illumination Map Generation and Normalization

The Illumination Stream produces two outputs:

A per-pixel spatial map: $\mathbf{M}_{\mathrm{spatial}} \in [0,1]^{H \times W \times 1}$
A global scalar brightness: $g \in [0,1]$

The illumination map is normalized as:

$\mathbf{M} = \mathrm{clamp}\left(0, \frac{\mathbf{M}_{\mathrm{spatial}}}{\mathbb{E}[\mathbf{M}_{\mathrm{spatial}}] + \varepsilon} \times g, 1\right),$

where $\mathbb{E}[\cdot]$ denotes the spatial mean over $H \times W$ and $\varepsilon = 10^{-6}$ , ensuring stable scale adaptation and numerical stability. This map steers the downstream modulation process.

3. IGM Block: Mathematical Formulation and Layerwise Design

Let $\mathbf{F}_{\mathrm{in}} \in \mathbb{R}^{H \times W \times C}$ denote the incoming features; $\mathbf{M} \in \mathbb{R}^{H \times W \times 1}$ is the illumination map. The IGM Block processes these via:

Attention Maps

Self-attention: Computed by a multi-scale attention (MSAttn) layer, which comprises parallel convolutions (kernel sizes 1×1, 3×3, 5×5), ReLU, and a sigmoid to yield

$\mathbf{A}_{\mathrm{self}} = \mathrm{MSAttn}(\mathbf{F}_{\mathrm{in}}) \in \mathbb{R}^{H \times W \times C}$

Guided-attention: Processed via a two-layer adapter from $\mathbf{M}$ :

$\mathbf{A}_{\mathrm{guide}} = \sigma\left(\mathrm{Conv}_{1\times1}(\phi(\mathrm{Conv}_{1\times1}(\mathbf{M})))\right) \in \mathbb{R}^{H \times W \times C},$

where $\phi = \mathrm{ReLU}$ , $\sigma$ is sigmoid, and $\mathbf{M}$ is broadcast to $C$ channels.

The two attention maps are fused additively:

$\mathbf{A}_{\mathrm{final}} = \mathbf{A}_{\mathrm{self}} + \mathbf{A}_{\mathrm{guide}}.$

Feature Gating and Update

After channelwise normalization (e.g., LayerNorm or BN), features are multiplicatively modulated:

$\mathbf{\hat{F}} = \mathrm{Norm}(\mathbf{F}_{\mathrm{in}}), \quad \mathbf{F}_{\mathrm{mod}} = \mathbf{\hat{F}} \odot \mathbf{A}_{\mathrm{final}}.$

A two-layer feed-forward network (FFN; Conv 1×1 → ReLU → Conv 1×1) refines $\mathbf{F}_{\mathrm{mod}}$ , and a residual skip connection yields:

$\mathbf{F}_{\mathrm{out}} = \mathbf{F}_{\mathrm{in}} + \mathrm{FFN}(\mathbf{F}_{\mathrm{mod}}).$

Layerwise Structure

Branch	Layer Sequence	Output Shape
Self-attention	Conv 1×1 → ReLU → Conv 3×3 (multi-scale) → Sigmoid	$H \times W \times C$
Guided-attention	Conv 1×1 → ReLU → Conv 1×1 → Sigmoid (adapter on $\mathbf{M}$ )	$H \times W \times C$
Fusion & Gating	Add, normalize, element-wise multiply	$H \times W \times C$
FFN + Residual	Conv 1×1 → ReLU → Conv 1×1; add skip connection	$H \times W \times C$

Typically, $C=64$ and $N=64$ blocks are used.

4. Spatial Adaptivity and Feature Dynamics

The per-pixel variation in $\mathbf{M}$ results in spatially adaptive guided attention $\mathbf{A}_{\mathrm{guide}}$ , which, upon fusion with data-driven $\mathbf{A}_{\mathrm{self}}$ , enables the block to amplify features where required. Specifically, in regions with low predicted illumination, the gating value increases, intensifying the enhancement. In well-illuminated regions, the effect is attenuated, preserving detail and avoiding overenhancement. This targeted behavior is a direct consequence of the per-pixel guidance and normalization protocol (Huang et al., 27 Jan 2026).

5. Empirical Evidence and Ablation Findings

Ablation studies have demonstrated the criticality of both the guidance mechanism and block depth:

Block Depth: Increasing the number of IGM Blocks from $N=16$ to $N=64$ in the Texture Stream increases PSNR on OmniNormal5 from $38.005$ dB to $38.106$ dB and SSIM from $0.9820$ to $0.9824$, indicating monotonic quality improvements with depth.
Guidance Removal: Eliminating the illumination guidance (reducing the block to a plain residual block) diminishes spatial adaptivity and final image quality, e.g., at $4\times$ super-resolution on OmniNormal15, PSNR shifts from $30.3$ dB (with guidance) to $30.2$ dB (without), and SSIM from $0.919$ to $0.916$.

A plausible implication is that illumination-guided attention is essential for localization of enhancement and for maximizing both quantitative and perceptual quality, especially in non-uniform illumination scenarios (Huang et al., 27 Jan 2026).

6. Implementation Details and Training Protocol

Key implementation parameters are as follows:

IGM Block count: $N=64$
Channels: $C=64$ (initial input $3\rightarrow 64$ )
Optimizer: Adam, learning rate $2 \times 10^{-4}$
Loss: $\ell_1$ -loss on the $Y$ -channel between super-resolved output and ground-truth HR
Batch size: $16$
Hardware: Trained on RTX A6000
Parameter count: $\approx 8.78$ M (Texture Stream, Illumination Stream, all IGM Blocks)
Framework: PyTorch with BasicSR for data I/O and PixelShuffle

All architectural and training details are reproducible from the original publication and can be adapted in new architectures exploiting the modularity of the IGM Block (Huang et al., 27 Jan 2026).

7. Modularity and Generalization

The IGM Block’s structural decomposition—comprising self-attention, guided-attention, additive fusion, channelwise normalization, and lightweight FFN—renders it modular. This division enables straightforward replacement of attention modules or integration into alternative architectures. The design supports further augmentation for other spatially adaptive restoration tasks beyond low-light super-resolution, as the guidance interface is agnostic to the specific prior used (illumination or otherwise) (Huang et al., 27 Jan 2026).

Markdown Report Issue Upgrade to Chat

References (1)

GTFMN: Guided Texture and Feature Modulation Network for Low-Light Image Enhancement and Super-Resolution (2026)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Illumination Guided Modulation Block (IGM Block).