Frequency-Domain Decomposition & Re-Composition

Updated 9 February 2026

Frequency-Domain Decomposition and Re-Composition modules are structured methods that separate signals into frequency bands and recombine them after adaptive processing.
They employ transformations like Fourier, DCT, or wavelet along with learnable filters and residual techniques to isolate distinct spectral components.
The recomposition phase uses adaptive fusion methods such as learnable transforms and cross-attention to enhance overall signal integrity and performance.

A Frequency-Domain Decomposition and Re-Composition (FDDR) module is a structured architectural component or algorithmic strategy that explicitly decomposes input signals or features into distinct frequency bands or spectral components, processes or analyzes these components in a specialized or adaptive fashion, and then recomposes the outputs to obtain enhanced, interpretable, or task-optimized representations. This paradigm has broad applications in signal processing, computer vision, audio processing, and machine learning, enabling domain-specific handling of information and improved task performance through physically, perceptually, or semantically meaningful separation and fusion strategies.

1. Architectural Principles and Signal Flow

The FDDR philosophy is to first expose the multi-scale or multi-frequency structure of signals (audio, image, feature maps, or time series), then perform targeted transformations or manipulations in this domain, and finally recombine the results into holistic outputs. Typical architectural steps include:

Frequency transformation: The input is mapped to an appropriate spectral domain, such as Fourier, Discrete Cosine, or wavelet transforms. For example, in the Freqformer image demoiréing framework, a recursive spatial filter decomposition generates frequency-separated components $I_l, I_h$ representing low-frequency (color) and high-frequency (texture and moiré) content (Liu et al., 25 May 2025).
Domain-specific decomposition: Frequency components are either separated by non-learnable filters (e.g., dilation-parameterized convolutional banks), learnable masks or attention (e.g., adaptive amplitude masks (Li et al., 2024)), or iterative residual methods (e.g., residual band “peeling” for high/mid/low/residual (Shen et al., 23 Sep 2025)).
Specialized processing: Each component is processed in a branch or path specialized for its physical, statistical, or semantic properties. For Freqformer, dual Transformer branches target high-frequency (spatially-localized, texture) and low-frequency (large-scale, color shift) artifacts, reflecting the frequency localization and source distinctions of moiré corruption.
Re-composition: A learnable Frequency Composition Transform (FCT) or equivalent fusion module adaptively combines the frequency-specific outputs, using spatially-varying learned weights or cross-domain post-processors (Liu et al., 25 May 2025). In other applications, recomposition may combine enhanced or denoised spectral bands into time- or image-domain signals.
Losses and supervision: Multi-stage or multi-scale supervision occurs both at subband (or component) output and final recomposed output, enabling both targeted denoising/boosting and overall fidelity.

This modular pipeline underpins a wide range of recent systems in deep learning for vision, audio, cross-modal understanding, and generative models.

2. Mathematical Formulation and Decomposition Strategies

Firm mathematical formulations are central to FDDR effectiveness and reproducibility. The main decomposition approaches include:

Recursive filter-based decomposition: Using a repeated, possibly dilated, low-pass (spatial) filter, iteratively produces

$I^{(i)}_l = \mathrm{Conv}(I^{(i-1)}_l; k; \text{dilation}=2^i),\quad I^{(i)}_h = I^{(i-1)}_h + I^{(i-1)}_l - I^{(i)}_l$

Finally, $I_l = I^{(L)}_l$ , $I_h = I^{(L)}_h$ (Liu et al., 25 May 2025).

Residual-based frequency band peeling: In the FDED module, a learned or fixed set of radial frequency thresholds $\tau_0 > \tau_1 > \cdots > \tau_{N+1}=0$ is used to sequentially subtract out bands from the full 2D DFT of the feature map, yielding a decomposition into high, mid, low, and residual bands for each modality (Shen et al., 23 Sep 2025).
Learnable amplitude masking: Instance-Adaptive Amplitude Filters (IAF) use convolutions, pooling, and activations to generate instance- and frequency-specific masks that suppress/modulate amplitude coefficients, before recombining with preserved phase and reconstructing via inverse DFT (Li et al., 2024).
Dynamic kernel decomposition: Learnable, spatially-variant low-pass (and complementary high-pass) filters are generated on-the-fly for frequency subband separation in e.g. SFAFNet’s FDGM, often using split channels and softmax normalization to maintain stability and adaptability (Gao et al., 20 Feb 2025).
Spectral SVD or principal component decomposition: For multi-channel or spatial audio signals, frame-wise spectral SVD (e.g., on MDCT coefficients per subband) enables dimension reduction and principal component selection with inherently smooth frame transitions (Zamani et al., 2017).

3. Recomposing and Fusion Mechanisms

Recomposition modules typically enact an adaptive, trainable process for merging frequency-specific outputs into the final representation.

Learnable fusion transforms: The Frequency Composition Transform (FCT) in Freqformer applies two separate convolutions to the final feature maps from each branch, then sums them and applies post-fusion Transformer layers. The final image is predicted with a pixel-wise linear combination of the fused feature channels, parameterized by learned weights per spatial position (Liu et al., 25 May 2025).
Weighted band summation: After processing high-, mid-, low-, and residual bands, per-band scalar weights are learned for each modality, allowing the network to dynamically control the emphasis of each band in the recomposed feature space (Shen et al., 23 Sep 2025).
Cross-attention and gating: In SFAFNet’s GFM, channelwise gating reweights low- and high-frequency features, which are then fused with cross-attention (CAM) to exploit complementary information and preserve edge, texture, and global content (Gao et al., 20 Feb 2025).
Mixture-of-Experts recomposition: For complex multimodal tasks, recomposed features are routed through a sparse mixture-of-experts (MoE) structure, as in the SCMC module for cross-modal audio-visual fusion, dynamically selecting routing weights and fusing expert outputs for maximal semantic consistency (Shen et al., 23 Sep 2025).

4. Representative Modules and Comparative Table

Several notable FDDR module instantiations appear in recent literature. The following table captures the core decomposition, recomposition, and key innovations for a selection of systems:

Module/System	Decomposition Approach	Re-Composition/Fusion
Freqformer (Liu et al., 25 May 2025)	Recursive spatial low-pass filter, multi-level, frequency separation	Learnable FCT: sum of convolutional branch projections, PostFusion Transformer layers
FDED (Shen et al., 23 Sep 2025)	Iterative radial frequency band “peeling” in DFT space	Learnable per-band weights, separate enhancement for HF bands, multimodal MoE fusion
FDMNet (Li et al., 2024)	Learnable amplitude masking (IAF); phase-preserving normalization (PPNorm)	Inverse FFT for image/feature recomposition
SFAFNet (Gao et al., 20 Feb 2025)	Learnable, spatially-variant low/high-pass dynamic filters (FDGM)	Channelwise gating and cross-attention (GFM), final sum
SVD-FDDR (Zamani et al., 2017)	MDCT spectral block-wise SVD, subband principal components	SVD-inverse projection, bandwise recombination, overlap-add via inverse MDCT

5. Task-Specific Processing and Application Domains

The decomposition-recomposition paradigm is highly task-adaptive:

Image restoration and demoiréing: Moiré patterns and color distortions are separated and handled differently; high-frequency (texture) corruptions are refined by deep high-branch Transformer layers, whereas low-frequency color distortions are mitigated in lower-branch processing (Liu et al., 25 May 2025).
Audio-visual segmentation: Contradictory frequency semantics in audio (noisy HF) and visual (structural HF) domains are disentangled, processed, and fused with dynamic multimodal routing (Shen et al., 23 Sep 2025).
Person re-identification: Frequency-domain amplitude masking (removing modality-specific bias) and phase preservation are critical for learning shared semantic features across visual-infrared domains (Li et al., 2024).
Image deblurring: Dynamic filter generation and gated cross-attention fusion preserve and exploit both global (low-frequency, blur) and local (high-frequency, details) aspects for effective restoration (Gao et al., 20 Feb 2025).
Spatial audio coding: Frequency-dependent subband SVD delivers higher spectral compaction and smoothness than frame-domain SVD, essential for perceptually efficient compression and rendering (Zamani et al., 2017).

6. Impact, Performance, and Comparative Insights

Empirical studies demonstrate that FDDR approaches outperform purely spatial, purely spectral, or undifferentiated fusion baselines. Notable results include:

Improved demoiréing performance and model compactness over wavelet- or CNN-only methods (Liu et al., 25 May 2025).
Quantitative gains in audio-visual segmentation via per-band enhancement and MoE fusion, with ablation showing substantial improvements (+1.3–3.5 mean Jaccard/F-score points with FDDR+MoE (Shen et al., 23 Sep 2025)).
Superior robustness to modality mismatch (VI-ReID) and significant gains in mAP and top-k retrieval rank with frequency-adaptive masking and recomposition (Li et al., 2024).
State-of-the-art deblurring with adaptive spectral separation and gated fusion, outperforming both frequency-only and spatial-only designs (Gao et al., 20 Feb 2025).
For ambisonics audio compression, FDDR yields smoother block transitions, lower bitrate, and better perceived spatial realism compared to MPEG-H and standard SVD/AAC codecs (Zamani et al., 2017).

These results underline the universal value of FDDR techniques in tasks wherever signal structure, artifact, or semantic information distributes non-uniformly across frequency bands.

7. Theoretical and Methodological Considerations

FDDR modules are grounded in classic signal processing theory (Fourier analysis, filter banks, SVD), but introduce domain-, task-, and data-driven learning and optimization for enhanced interpretability and control. Some salient points include:

The success of FDDR depends crucially on the correct domain and granularity of decomposition (e.g., level of recursion, band thresholding, filter design).
Learnable fusion transforms excel over fixed-weight strategies, allowing spatially-varying, sample-specific combinatorics driven directly by the task loss.
Integration of classical “hard” or “brick-wall” methods (e.g., classical ideal filter banks or SVD) allows for faithful energy/place preservation, though hybrid designs blending analytic and learned filtering often show best results when signal characteristics are too complex for closed-form rules.
Cross-modal or cross-domain FDDR methods highlight the pitfalls of “one-size-fits-all” spectral fusion and motivate continued research into adaptive, data-driven decompositions with explicit semantic or perceptual constraints.

In sum, frequency-domain decomposition and re-composition modules offer a principled, extensible, and empirically validated framework for extracting and recombining frequency-dependent structure, with broad applicability across fields. Continued research is oriented toward more efficient, robust, and interpretable decomposition flows, as well as more powerful learned fusion strategies that synergize analytic priors with deep neural representations.