Efficient Frequency Domain-based Transformers for High-Quality Image Deblurring

Published 22 Nov 2022 in cs.CV | (2211.12250v1)

Abstract: We present an effective and efficient method that explores the properties of Transformers in the frequency domain for high-quality image deblurring. Our method is motivated by the convolution theorem that the correlation or convolution of two signals in the spatial domain is equivalent to an element-wise product of them in the frequency domain. This inspires us to develop an efficient frequency domain-based self-attention solver (FSAS) to estimate the scaled dot-product attention by an element-wise product operation instead of the matrix multiplication in the spatial domain. In addition, we note that simply using the naive feed-forward network (FFN) in Transformers does not generate good deblurred results. To overcome this problem, we propose a simple yet effective discriminative frequency domain-based FFN (DFFN), where we introduce a gated mechanism in the FFN based on the Joint Photographic Experts Group (JPEG) compression algorithm to discriminatively determine which low- and high-frequency information of the features should be preserved for latent clear image restoration. We formulate the proposed FSAS and DFFN into an asymmetrical network based on an encoder and decoder architecture, where the FSAS is only used in the decoder module for better image deblurring. Experimental results show that the proposed method performs favorably against the state-of-the-art approaches. Code will be available at \url{https://github.com/kkkls/FFTformer}.

Abstract PDF Upgrade to Chat

Citations (88)

View on Semantic Scholar

Summary

The paper introduces a novel frequency-domain transformer approach that replaces matrix multiplications with element-wise products, significantly reducing computational complexity.
It integrates a discriminative feed-forward network that uses a gating mechanism to preserve low- and high-frequency details crucial for restoring image clarity.
Experimental results on datasets like GoPro demonstrate notable improvements in PSNR and SSIM over state-of-the-art methods, even with fewer model parameters.

Efficient Frequency Domain-based Transformers for High-Quality Image Deblurring

The paper "Efficient Frequency Domain-based Transformers for High-Quality Image Deblurring" introduces a novel approach that leverages the properties of transformers in the frequency domain for the task of image deblurring. The research presents a significant advancement in reducing the complexity and increasing the efficiency of transformer models for this particular application.

The authors address key limitations in existing image deblurring methods, which predominantly rely on deep convolutional neural networks (CNNs). These methods often face challenges in modeling spatially variant properties of image contents due to the spatially invariant nature of convolution operations. Even though transformer models have shown promise by modeling global contexts through attention mechanisms, their high computational cost has been a barrier.

Central to this work is the use of the convolution theorem, which states that the convolution of two signals in the spatial domain corresponds to a point-wise multiplication in the frequency domain. This property is exploited to develop an efficient Frequency Domain-based Self-Attention Solver (FSAS) that replaces the conventional matrix multiplication required for attention computation with an element-wise product operation in the frequency domain. Consequently, this approach reduces the space and time complexity significantly when dealing with the attention mechanism, specifically to $O(N)$ spatial and $O(N\log N)$ in the frequency domain per feature channel, where $N$ is the number of pixels.

Furthermore, the study presents a discriminative frequency domain-based Feed Forward Network (DFFN), which builds on the JPEG compression algorithm. DFFN incorporates a gating mechanism to preserve relevant low- and high-frequency information crucial for deblurring, thus enhancing the quality of the restored image.

The integration of FSAS and DFFN within an asymmetric encoder-decoder network architecture is another strategic innovation. By deploying FSAS only in the decoder module, the architecture circumvents the issue of inaccurate attention computation due to blurred features in the earlier stages of the network. This asymmetric design leverages the clearer features in deeper layers more effectively for image restoration.

The experimental validation on public datasets, including GoPro, RealBlur, and HIDE, demonstrates the superiority of the proposed approach over state-of-the-art methods, both in terms of quantitative measures such as PSNR and SSIM and qualitative visual clarity. Particularly on the GoPro dataset, the authors report notable improvements over established methods like NAFNet, despite having fewer model parameters. This efficacy is further supported by visual comparisons that highlight the model's ability to recover clear details and structures from blurred images.

In terms of implications, the technique shows promise for applications requiring efficient and accurate image restoration, potentially benefiting areas such as photography and video processing where deblurring is essential. The study also opens up avenues for further exploration of transformer-based models leveraging frequency domain operations, not only in deblurring but in other image processing tasks. Future work could explore optimizing the asymmetric network architecture further or adapting the frequency domain-based attention mechanism for other transformer applications in computer vision.

In summary, this research provides a substantial contribution to the field of image restoration by combining frequency domain insights with transformer architectures, thereby achieving an efficient yet high-performance solution to the image deblurring problem.