Adaptive Residual Scaling Module (ARSM)

Updated 24 December 2025

Adaptive Residual Scaling Module (ARSM) is a neural add-on that dynamically scales residual connections at local levels to boost expressivity and mitigate oversmoothing.
It tailors residual contributions using learnable or heuristic strategies, effectively addressing feature entanglement in graph neural networks and image editing tasks.
ARSM enhances model stability and convergence in full-waveform inversion by decoupling global predictions from local corrections under challenging conditions.

The Adaptive Residual Scaling Module (ARSM) is a lightweight neural add-on designed for dynamic and localized modulation of residual signal pathways in deep learning models. It addresses challenges associated with ineffective signal propagation, oversmoothing, and feature entanglement by adaptively scaling residual contributions on a per-node or per-pixel basis. ARSM advances model expressivity and stability in domains including graph neural networks (GNNs), reference-guided image editing, and full-waveform inversion (FWI), offering both theoretically backed regularization and empirical performance gains (Shirzadi et al., 10 Nov 2025, Zhou et al., 17 Dec 2025, Dong et al., 17 Feb 2025).

1. General Formulation and Design Principles

ARSM implements residual scaling by learning, predicting, or assigning individual weights (scalars or maps) to residual connections at each spatial or structural location. Formally, if $h_\text{main}$ denotes a main predictive pathway and $h_\text{res}$ a residual input, ARSM modulates their combination:

$h_\text{out} = h_\text{main} + \alpha \odot h_\text{res}$

where $\alpha$ is a vector, diagonal matrix, or spatial map learned or otherwise determined by network context. Distinct ARSM instantiations tailor $\alpha$ to the data structure: node-wise values for graphs (Shirzadi et al., 10 Nov 2025), per-pixel maps for images (Zhou et al., 17 Dec 2025), or spatially-varying coefficients in FWI (Dong et al., 17 Feb 2025). This localized scaling is motivated by the need to selectively preserve or amplify certain features while attenuating others, increasing both the expressivity and the robustness of the residual pathway.

2. ARSM in Deep Graph Neural Networks

In graph neural networks, deep message passing typically leads to oversmoothing, where node embeddings become indistinguishable due to repeated averaging (Shirzadi et al., 10 Nov 2025). ARSM counteracts this by adaptively blending neighbor aggregation with initial features through node-specific residual strengths.

Let $G = (V, E, A)$ with normalized adjacency $\mathcal{A}$ , $H^{(\ell)}$ the embedding matrix at layer $\ell$ , and $X = H^{(0)}$ input features:

$H^{(\ell+1)} = \sigma \left( \Lambda \mathcal{A} H^{(\ell)} W^{(\ell)} + (I - \Lambda) X \Theta^{(\ell)} \right)$

where $\Lambda = \mathrm{diag}(\alpha_1, \ldots, \alpha_n)$ encodes nodewise strengths ( $\alpha_i \in (0,1)$ ). Two ARSM variants are used:

Learnable ARSM: $\alpha_i$ are dynamically computed via a sigmoid projection of input features with parameters $w_{att}$ .
Heuristic ARSM: $\alpha_i$ are fixed based on node centrality (e.g., PageRank percentile).

Theoretical guarantees show ARSM preserves the Dirichlet energy,

$\mathcal{E}(H) = \mathrm{tr}(H^\top \mathcal{L} H), \quad \mathcal{L} = I - \mathcal{A},$

with lower bounds that prevent feature collapse. This framework offers computational efficiency, minimal parameter overhead, and strong accuracy—especially on heterophilic graphs, with accuracy gains up to $+28.8\%$ over GCNII on benchmarks such as Chameleon (Shirzadi et al., 10 Nov 2025).

3. ARSM for Reference-Guided Instance Editing

In image editing tasks, ARSM is incorporated to resolve semantic entanglement between intrinsic and extrinsic reference attributes (Zhou et al., 17 Dec 2025). The module is used in frameworks such as GENIE, operating within U-Net blocks during reference injection.

Given spatially aligned reference features $F_r$ and target features $F_t$ , ARSM computes a per-pixel scaling map via a compact convolutional subnetwork:

Channel concatenation: $X = F_r \oplus F_t$ .
Two-layer conv net: $\alpha = \tanh(\mathrm{Conv}_2(\mathrm{ReLU}(\mathrm{Conv}_1(X))))$ .
Scaling: $F_r' = (1+\alpha) \odot F_r$ .

The effect is to enhance intrinsic appearance cues (texture, color) while suppressing extrinsic attributes (pose, illumination). During training, ARSM is optimized end-to-end solely by the global diffusion objective, obviating separate losses or regularization. Empirical results indicate ARSM yields a modest PSNR gain (~0.14 dB), an FID improvement (~1.19), and increased semantic alignment (CLIP up by 0.13) on AnyInsertion tasks (Zhou et al., 17 Dec 2025).

4. ARSM in Physics-Driven Full-Waveform Inversion

For full-waveform inversion, ARSM is implemented as a non-intrusive residual correction branch on top of a pretrained U-Net (Dong et al., 17 Feb 2025). It decouples large-scale background prediction from high-frequency local corrections, aiding convergence under ill-posed or data-scarce regimes.

The workflow comprises:

α-layer: A $1 \times 1$ convolution on U-Net output to yield $\alpha(x, z)$ per spatial location.
Zeroing: Input-agnostic zero-initialization of the residual path ensures the base prediction is preserved initially.
Parametric layer: Another $1 \times 1$ convolution learns $R_{\mathrm{raw}}(x, z)$ , a local correction. The ARSM correction is

$p_{\mathrm{final}}(x, z) = p_{\mathrm{unet}}(x, z) + \alpha(x, z) \cdot R_{\mathrm{raw}}(x, z)$

with all parameters co-trained via FWI loss. Experiments confirm that ARSM improves model accuracy under missing frequencies, noise, or poor initialization; e.g., mean absolute error (MAE) decreases from ≈274 m/s (classical) to ≈125 m/s (full method) on Marmousi (Dong et al., 17 Feb 2025).

5. Empirical Results and Ablation Analyses

Empirical studies across domains reinforce the effectiveness of ARSM:

GNNs: Oversmoothing is eliminated for up to 16 layers; test accuracies are robust to depth and heterophily, with large margins over baselines.
Instance Editing: ARSM delivers additive improvements in denoising and semantic metrics, with ablation showing PSNR and FID gains on top of strong baselines.
FWI: ARSM enables recovery of both global stratigraphy and thin beds/faults under harsh inversion conditions; ablations indicate joint pretraining and ARSM branches are critical for optimal MAE.

A summary of empirical ARSM impacts is presented below.

Domain	ARSM Role	Representative Gain
GNN node classification	Prevents oversmoothing, preserves Dirichlet energy	+28.8% test accuracy (Chameleon)
Image instance editing	Disentangles appearance, improves fidelity	–1.19 FID, +0.13 CLIP
FWI	Enhances local corrections, robust to missing low-freqs	MAE reduced ≈274→125 m/s

6. Limitations and Future Directions

ARSM in its present forms exhibits several limitations:

In image editing, scaling is channelwise; geometric or spatial entanglements may require attention or warping mechanisms (Zhou et al., 17 Dec 2025).
In FWI, expressivity is limited by the depth and parameterization of the corrective branch (Dong et al., 17 Feb 2025).
Robustness relies on the adequacy of feature inputs (e.g., $F_t$ in GENIE or input features in GNNs).

Potential extensions include multi-head attention for richer modulation, incorporating explicit adversarial or perceptual losses for stronger disentanglement, or temporal regularization for video (Zhou et al., 17 Dec 2025).

7. Architectural and Computational Details

ARSM modules are designed for insertion with minimal computational and parameter overhead:

In GNNs, the cost adds one vector projection per epoch for learnable ARSM; heuristic variants incur none (Shirzadi et al., 10 Nov 2025).
In image models, ARSM uses two $3\times3$ conv layers per block, with standard Kaiming initialization (Zhou et al., 17 Dec 2025).
For FWI, two $1\times1$ convs per spatial location are used, all parameters zero-initialized (Dong et al., 17 Feb 2025).

Hyperparameters (e.g., $\alpha_\text{max}, \alpha_\text{min}$ , percentile cutoffs) are tuned by validation; weight decays, dropout, and learning rates follow standard protocols.

ARSM thus offers a theoretically principled, computationally lightweight architecture for adaptive, context-sensitive scaling of residual pathways across diverse neural and scientific computing applications.