RGAM: Reflection Guided Attention Module

Updated 28 November 2025

RGAM is a reflection-aware neural attention module that dynamically fuses features for robust reflection removal and glass detection.
It employs learned spatial- and channel-wise masks to guide feature selection, adapting U-Net skip connections for improved reconstruction.
Empirical results show significant boosts in performance, with notable improvements in PSNR for reflection removal and IoU for glass detection.

The Reflection Guided Attention Module (RGAM) is a distinct class of neural attention modules that leverage explicit reflection-aware signals to guide feature selection for spatial reconstruction and semantic segmentation in vision tasks involving reflections, notably single image reflection removal (SIRR) and glass surface detection. RGAM achieves effective feature fusion and dynamic gating by learning to selectively emphasize or suppress particular features based on local reflection properties, thus improving robustness in challenging domains where linear superposition or global cues often become insufficient.

1. Core Motivation and Conceptual Foundations

RGAM addresses the inherent challenges in distinguishing between transmission (clean scene) and reflection layers in images acquired through glass or other reflective media, as well as accurately localizing transparent surfaces in unconstrained environments. In reflection removal, the observed image $I$ is typically considered as a sum $I = T + R$ of transmission $T$ and reflection $R$ . However, strict additivity breaks down in areas of strong reflection, requiring context-driven inpainting rather than mere subtraction. In glass surface detection, differentiating glass from surrounding materials is complicated by transparency and context dependence, with reflections acting as implicit evidence for glass presence.

Key to RGAM’s motivation is the observation that:

Reflection-heavy regions necessitate a mode switch from difference-based recovery to context encoding or inpainting.
Weak/no-reflection regions permit direct reliance on reflection-suppressed difference features.
For glass surface detection, regions exhibiting both high reflection and glass-like morphology are the most discriminative.

By learning spatial- and channel-wise attention masks or joint attention maps, RGAM can dynamically reconcile local validity of linear models or context-driven semantic cues across varying reflection intensities (Li et al., 2020, Yan et al., 21 Nov 2025).

2. Detailed Architectures in Reflection Removal and Glass Detection

RGAM (termed “RAG module”) is embedded at each decoder stage within a two-stage U-Net cascade:

Stage 1: Predicts reflection map $\hat{R}$ via a single-input U-Net ( $G_R$ ).
Stage 2: Dual-encoder U-Net ( $G_T$ ) receives both the observation and the previously estimated reflection.
At decoder level $i$ $i$ , the module:
1. Computes the “difference feature” $F_\mathrm{diff}^i = F_I^i - F_R^i$ from encoder outputs.
2. Concatenates $[F_I^i; F_R^i; F_\mathrm{dec}^i]$ and passes through two consecutive $I = T + R$ 0 convs and sigmoid to predict per-channel spatial masks $I = T + R$ 1.
3. Stacks $I = T + R$ 2 and reweights/combines via partial convolution with $I = T + R$ 3.
4. Outputs are processed by a $I = T + R$ 4 convolution + ReLU stack.

This “gating” adapts skip connections and decoding based on local reflection confidence.

In glass detection, RGAM fuses per-scale features from two backbone streams (flash, no-flash) and a reflection feature from the Reflection Contrast Mining Module (RCMM):

For each scale $I = T + R$ 5, RGAM receives:
- $I = T + R$ 6: A $I = T + R$ 7 convolution projection of concatenated no-flash and flash features.
- $I = T + R$ 8: Reflection feature from RCMM.
Both $I = T + R$ 9 and $T$ 0 are reshaped and normalized per head in a multi-head architecture.
Two parallel cross-attention branches are computed:
- “Top” branch uses $T$ 1 as query, $T$ 2 as key/value.
- “Bottom” branch reverses the role.
Attention maps $T$ 3 and $T$ 4 are shifted (minimum-zeroed), multiplicatively fused, and softmaxed to yield shared attention $T$ 5.
Final features from both sources are reweighted and summed to produce $T$ 6, which is used by the decoder to generate glass masks.

This approach emphasizes those regions expressing both glass-consistent structure and flash-induced reflections.

3. Mathematical Formalization

Difference Feature:

$T$ 7

Learned Masking:

$T$ 8

$T$ 9

Partial Convolution:

$R$ 0

Mask Loss:

$R$ 1

$R$ 2

$R$ 3

Feature Construction:

$R$ 4

Cross-Attention (two branches):

$R$ 5

Symmetrically for $R$ 6 in the alternate branch.

Fusing:

$R$ 7

$R$ 8

4. Implementation Specifics and Parameterization

Model	Feature Preparation	Attention/Gating	Downstream Usage
RAGNet (Li et al., 2020)	Dual encoders; encoder difference	Channel/spatial masks + PConv	Decoder block gating
NFGlassNet (Yan et al., 21 Nov 2025)	Backbone streams + RCMM	Dual-head cross-attention fusion	Scale-wise multi-stream fusion

Reflection removal RGAM uses 1×1 convs for mask prediction, 3×3 partial convs, and per-channel mask splitting; no batch norm is present in the attention submodule.
Glass detection RGAM uses Kaiming/Xavier init for projections, multi-head channels, LayerNorm, no explicit dropout in the module, and fuses feature maps at multiple scales.

Both designs prioritize local adaptive gating to modulate information flow depending on reflection presence or glass context.

5. Training Objectives and Losses

In reflection removal (Li et al., 2020), RGAM has direct loss supervision on its mask outputs with dedicated terms ( $R$ 9), penalizing incorrect mask activations in strong or weak reflection regions. The full objective combines reconstruction, perceptual, exclusion, adversarial, and mask-specific losses: $\hat{R}$ 0 with weights $\hat{R}$ 1, $\hat{R}$ 2, $\hat{R}$ 3.

In glass detection (Yan et al., 21 Nov 2025), RGAM does not receive direct supervision; it is optimized solely via the end-to-end task losses, which include IoU and binary cross-entropy for glass mask prediction and $\hat{R}$ 4 for reflection estimation.

6. Empirical Ablations and Observed Effects

Ablation studies in both domains confirm RGAM’s critical role:

In SIRR (Li et al., 2020), removing the difference feature or using naïve skip connections induces marked PSNR drops (e.g., $\hat{R}$ 5 dB on the Real20 set, $\hat{R}$ 6 dB on SIR $\hat{R}$ 7 Wild). Disabling learned masks or using single-channel masks reduces restoration quality by $\hat{R}$ 8 dB.
For glass detection (Yan et al., 21 Nov 2025), ablations show RGAM boosts IoU by $\hat{R}$ 9 points; omitting shared attention reduces IoU by $G_R$ 0, and replacing the cross-stream querying (“alternate-Q”) loses $G_R$ 1– $G_R$ 2 IoU points. Removing the attention shift step degrades IoU by $G_R$ 3.

Qualitative outputs display sharper, artifact-free transmission predictions and precise glass mask localization in regions where both reflection and glass indicators co-occur.

7. Interpretation and Context within the Field

RGAM modules generalize the use of explicit physical reasoning—reflection detection or suppression—via learnable dynamic attention mechanisms at both feature and spatial scales. The studied variants demonstrate that channel-wise and spatially varying gating, informed by learned or mined reflection signals, outperforms global or hard-coded approaches. These results substantiate that dynamically adaptive fusion, as enabled by RGAM, respects nonuniform physical priors (e.g., breakdown of linear superposition) and enhances performance in scenarios with ambiguous or spatially varying cues.

While the specific instantiations in reflection removal and glass detection differ architecturally, RGAM consistently outperforms naïve fusion or attentionless baselines, supporting its generality as a reflection-aware feature fusion paradigm (Li et al., 2020, Yan et al., 21 Nov 2025).

Markdown Report Issue Upgrade to Chat

References (2)

Two-Stage Single Image Reflection Removal with Reflection-Aware Guidance (2020)

Glass Surface Detection: Leveraging Reflection Dynamics in Flash/No-flash Imagery (2025)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Reflection Guided Attention Module (RGAM).

RGAM: Reflection Guided Attention Module

1. Core Motivation and Conceptual Foundations

2. Detailed Architectures in Reflection Removal and Glass Detection

2.1. Two-Stage SIRR with Reflection-Aware Guidance (Li et al., 2020)

2.2. Multi-Scale Fusion in NFGlassNet (Yan et al., 21 Nov 2025)

3. Mathematical Formalization

3.1. Reflection Removal (Li et al., 2020)

3.2. Glass Detection (Yan et al., 21 Nov 2025)

4. Implementation Specifics and Parameterization

5. Training Objectives and Losses

6. Empirical Ablations and Observed Effects

7. Interpretation and Context within the Field

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

RGAM: Reflection Guided Attention Module

1. Core Motivation and Conceptual Foundations

2. Detailed Architectures in Reflection Removal and Glass Detection

2.1. Two-Stage SIRR with Reflection-Aware Guidance (Li et al., 2020)

2.2. Multi-Scale Fusion in NFGlassNet (Yan et al., 21 Nov 2025)

3. Mathematical Formalization

3.1. Reflection Removal (Li et al., 2020)

3.2. Glass Detection (Yan et al., 21 Nov 2025)

4. Implementation Specifics and Parameterization

5. Training Objectives and Losses

6. Empirical Ablations and Observed Effects

7. Interpretation and Context within the Field

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Related Topics