DyFusNet: Dynamic Frequency-Spatial Fusion Network

Updated 2 February 2026

DyFusNet is a neural network architecture that dynamically fuses frequency and spatial domain features using adaptive filtering and attention mechanisms.
It employs key modules like dynamic filter blocks, multi-resolution spectral decomposition, and wavelet-based methods for efficient and robust feature extraction.
DyFusNet enhances applications in remote sensing, object detection, image restoration, and forecasting by seamlessly integrating domain-specific processing.

A Dynamic Frequency-Spatial Unified Synergy Network (DyFusNet) is a class of deep learning architecture that jointly and adaptively exploits both frequency-domain and spatial-domain representations for feature extraction, fusion, or prediction. These networks systematically couple dynamic, input-conditioned frequency filtering with spatial modeling, often including adaptive attention or channel-mixing mechanisms, to realize synergistic processing tailored to heterogeneous, nonstationary, or multi-modal data. DyFusNet principles appear in numerous domains including remote sensing, object detection, image restoration, and time-series forecasting. The following sections provide a detailed, mathematically rigorous exposition of DyFusNet variants spanning the state-of-the-art.

1. Core Architectural Principles

DyFusNet architectures tightly integrate two operational motifs: (1) adaptive frequency-domain analysis using data-driven or dynamically-weighted filter banks, and (2) spatial-domain modeling via convolutional, pooling, or attention modules. The synergy between domains is ensured by explicit fusion blocks or cross-attention schemes that allow selective, content-sensitive mixing.

In multi-source remote sensing, DyFusNet utilizes a dual-branch motif, alternating frequency-adaptive kernels and cross-modal spatial–spectral attention to enable robust land cover classification under heterogeneous inputs (e.g., HSI and SAR/LiDAR) (Zhao et al., 6 Jul 2025). In real-time vision, DyFusNet acts as a generalized fusion block, replacing static convolutions with dynamic, learnable mixings of low/mid/high-frequency analogues and spatial-channel gating (Xia et al., 26 Jan 2026). In image restoration, DyFusNet frameworks feature dynamic convolution experts, wavelet-based subband fusion, and adaptive feature gating (Zhang et al., 7 Apr 2025, Gao et al., 20 Feb 2025).

2. Frequency–Spatial Fusion Mechanisms

The canonical DyFusNet module decomposes input tensors into multiple channels processed in parallel for frequency- and spatial-path extraction, then recombines them with learned or adaptive weights. Three prominent design patterns emerge:

Dynamic Filter Block (DFB): Learns input-dependent frequency-domain filter kernels by projecting input statistics through an MLP/Softmax over a set of basis filters, enabling adaptive frequency emphasis. Filtering is performed in the frequency domain with inverse FFT bringing features back to the spatial domain (Zhao et al., 6 Jul 2025).
Dynamic Multi-resolution Spectral Decomposition (DMSD): Implements a weighted sum of simple spatial operations—average pooling (low), identity (mid), and depthwise convolution (high)—with softmax-determined band weighting derived from global feature statistics (Xia et al., 26 Jan 2026).
Wavelet or Learnable Low/High-pass Decomposition: Uses either a fixed discrete wavelet transform (DWT) or content-adaptive filter banks generated on the fly via lightweight neural layers to separate and enhance global structure and detailed, edge-like information (Zhang et al., 7 Apr 2025, Gao et al., 20 Feb 2025).

DyFusNet's spatial modules typically employ depthwise or pointwise convolutions, global context aggregation, or non-local attention for multi-scale context enrichment.

3. Fusion and Attention Strategies

After frequency and spatial features are extracted, sophisticated fusion techniques ensure these complementary cues inform the final representation:

Spectral–Spatial Adaptive Fusion: Applies attention mechanisms tailored separately to “spectral” (channel) and “spatial” (pixel) axes, followed by channel shuffle and residual summation for comprehensive mixing (Zhao et al., 6 Jul 2025).
Spatial–Frequency Cooperative Modulation (SFCM): Simultaneously aggregates context at multiple spatial scales and applies a channel-wise gating based on global pooled descriptors. This allows DyFusNet to preserve relevant object boundaries while suppressing background clutter (Xia et al., 26 Jan 2026).
Bidomain Information Synergy (BIS), Gated Fusion Module (GFM): Realizes synergy through concatenation, attention-based re-weighting, and cross-attention between all frequency and spatial representations, culminating in a fully fused output (see GSFFBlock) (Zhang et al., 7 Apr 2025, Gao et al., 20 Feb 2025).
Adaptive Attention Coupling Gate (AACG): For skip-connections or hierarchical fusion, multi-head cross-attention computes learned affinities between encoder and decoder-level features, with content-gated reweighting to select contextually salient components (Zhang et al., 7 Apr 2025).

4. Mathematical Formulation of Major DyFusNet Components

The following table summarizes principal mathematical forms of key DyFusNet submodules across applications:

Submodule	Core Equation(s)	Reference
DFB (frequency kernel)	$\mathcal{K}(\mathbf{X}_{\text{in}}) = \sum_{i=1}^N a_i F_i$ ; $a = \text{Softmax}(\mathrm{MLP}(\text{GAP}(\mathbf{X}_{\text{in}})))$	(Zhao et al., 6 Jul 2025)
DMSD	$\mathrm{DMSD}(X_1)=\sum_{i\in\{\text{low},\text{mid},\text{high}\}} \alpha_i(X_1) \mathcal{H}_i(X_1)$	(Xia et al., 26 Jan 2026)
Wavelet Path	$F_{pq}(u,v) = \sum_{mn} F(m,n)p(m-2u)q(n-2v)$ (DWT); $\widetilde F = \mathrm{IDWT}(\hat F_{LL}, F_{LH}, F_{HL}, F_{HH})$	(Zhang et al., 7 Apr 2025)
FDGM (freq. dynamic gen.)	$F^L = \text{Softmax}(F^1\otimes F^2)$ , $F^H = I - F^L$	(Gao et al., 20 Feb 2025)
Fusion (1x1 conv/project)	$Y = \mathrm{Conv}_{1 \times 1}([F_{\mathrm{freq}}(X_1); X_2])$	(Xia et al., 26 Jan 2026, Zhao et al., 6 Jul 2025)
Cross-Attn Fuser	$S = QK^\top$ ; $A_s = \text{Softmax}(S)$ ; $X_{sh}^{gs} = f_{1\times 1}(X_s^g)A_h$	(Gao et al., 20 Feb 2025)

Each block is critical to dynamic, content-driven adaptation in both the frequency and spatial domains.

5. Applications and Benchmarks

Remote Sensing

DyFusNet achieves state-of-the-art results in multi-source remote sensing classification, outperforming baselines on HSI+SAR (Berlin: OA 75.42%, AA 64.85%, Kappa 63.22%) and HSI+LiDAR (Houston2013: OA 92.35%, AA 93.48%, Kappa 91.70%). Both dynamic filter and adaptive fusion components are necessary for optimal performance, as ablation disables reduce OA by 1–2% (Zhao et al., 6 Jul 2025).

UAV/Object Detection

As a fusion module in EFSI-DETR, DyFusNet raises detection AP and small-object AP by 1.6% and 5.8% on VisDrone, with inference rates of 188 FPS on RTX 4090. Plug-in efficiency derives from reliance on average pooling, depthwise convolutions, and compact MLPs, avoiding FFTs or non-deployable operations (Xia et al., 26 Jan 2026).

Image Segmentation in Adverse Conditions

In road-ponding detection under fog/low-light, DyFusNet’s dynamic convolution plus wavelet–spatial synergy (with BIS, MIA, and AACG) increases IoU by up to +3.51% over state-of-the-art on the Foggy-Puddle set, running at 25.5 FPS on embedded hardware (Zhang et al., 7 Apr 2025).

Image Restoration

For deblurring, DyFusNet-style architectures (SFAFNet) leverage gated spatial–frequency fusion blocks to outperform both spatial-only and frequency-only baselines by up to 1 dB PSNR, while halving FLOPs compared to prior art (Gao et al., 20 Feb 2025).

Power Systems Forecasting

Early DyFusNet variants used electrical-distance–based 2D spatial embedding and CNNs for accurate post-disturbance frequency prediction in grids, attaining RMSE as low as 0.0024 Hz on IEEE 39-bus tests—substantially outpacing all MLP and SVR baselines (Lin et al., 2019).

6. Implementation Details and Computational Considerations

DyFusNet modules prioritize hardware efficiency by minimizing global operations (e.g., FFT) in favor of statically kernel-fusable primitives (GAP, depthwise/pointwise convolution, small MLP). For real-time applications, inference speeds exceed 180 FPS on flagship GPUs or 25 FPS on edge devices, with models ranging from <2M to ~47M parameters, and FLOPs scaling from 0.03G (remote sensing) to ~38G (segmentation) (Zhao et al., 6 Jul 2025, Zhang et al., 7 Apr 2025, Xia et al., 26 Jan 2026).

Training typically relies on Adam or AdamW optimizers with moderate batch sizes and learning rate schedules tailored to each task. Losses combine cross-entropy, IoU, or regression objectives depending on application.

7. Synergy Rationale, Impact, and Limitations

The principal advantage of DyFusNet lies in treating spatial and frequency domains as equally first-class, dynamically harmonized via data-driven gating, attention, or cross-modal fusion. This yields enhanced robustness to distributional heterogeneity (e.g., visibility, background clutter, sensor noise), preserves both global context and local detail, and enables efficient, plug-in deployment for multi-modal or multi-scale fusion.

A plausible implication is that future DyFusNet designs will shift towards even more fine-grained, content-conditioned filter generation and hybridization with large-scale attention or foundation models, further blurring boundaries between spatial, spectral, and semantic representations. Limitations center on the potential redundancy of dual-domain processing for tasks where one modality suffices, and the need for careful tuning of architecture hyperparameters (e.g., split ratios, number of basis filters, channel grouping strategies) to avoid bottlenecks or compute/memory inflation.

References

"Dynamic Frequency Feature Fusion Network for Multi-Source Remote Sensing Data Classification" (Zhao et al., 6 Jul 2025)
"EFSI-DETR: Efficient Frequency-Semantic Integration for Real-Time Small Object Detection in UAV Imagery" (Xia et al., 26 Jan 2026)
"ABCDWaveNet: Advancing Robust Road Ponding Detection in Fog through Dynamic Frequency-Spatial Synergy" (Zhang et al., 7 Apr 2025)
"Spatial and Frequency Domain Adaptive Fusion Network for Image Deblurring" (Gao et al., 20 Feb 2025)
"Post-Disturbance Dynamic Frequency Features Prediction Based on Convolutional Neural Network" (Lin et al., 2019)