- The paper introduces a multi-stage design for super-resolution that incrementally refines image details for enhanced performance.
- It replaces transposed convolution with an efficient PixelShuffle operation, lowering parameter count and computational cost while avoiding artifacts.
- Experimental results on BSD100 demonstrate state-of-the-art performance with reduced latency and higher throughput.
WaveMixSR-V2: Enhancing Super-Resolution with Higher Efficiency
Introduction
Single-image super-resolution (SISR) remains a significant task within image reconstruction, aiming to enhance the quality of images by converting low-resolution (LR) images into high-resolution (HR) counterparts. This involves predicting and reinstating the detailed information lost in the lower resolution. Recent technological developments in this field have seen notable contributions from token mixers and transformer architectures. Specifically, attention-based transformers like SwinFIR and hybrid attention transformers have made substantial progress by better capturing long-range dependencies. However, transformers have the inherent challenge of quadratic complexity in self-attention, leading to high resource demand alongside the requirement for extensive datasets.
Background
WaveMixSR, based on the WaveMix architecture, addressed some of these challenges by employing a two-dimensional discrete wavelet transform for spatial token mixing. This allowed the model to achieve superior super-resolution while being resource efficient. The current paper presents an enhanced version of WaveMixSR, termed WaveMixSR-V2, which integrates two critical improvements:
- Replacing the traditional transpose convolution layer with a PixelShuffle operation.
- Implementing a multi-stage design for higher super-resolution tasks (specifically for 4× SR).
Architectural Enhancements
Multi-Stage Design
The original WaveMixSR model relied on a single-stage design, which resized the LR image directly to HR using a non-parametric upsampling layer, such as bilinear or bicubic interpolation. This approach constrained the model’s ability to fine-tune and optimize across different scales efficiently. WaveMixSR-V2 overcomes this by introducing a multi-stage design comprising sequential resolution-doubling 2× SR blocks. For instance, a task requiring 4× super-resolution now progresses through a series of two 2× SR blocks. This staged approach enhances the model's capacity to refine details incrementally, thereby leading to improved performance with reduced resource usage.
PixelShuffle
A significant modification in WaveMixSR-V2 is the substitution of the transposed convolution operation with a PixelShuffle operation followed by a convolution layer. Where transposed convolutions tend to involve numerous parameters and high computational cost, PixelShuffle rearranges pixels from feature maps more efficiently. Concomitantly, the subsequent convolution layer continues feature refinement. This approach diminishes parameter count and computational expense while avoiding checkerboard artifacts typically introduced by transposed convolutions, resulting in smoother and more natural-looking images.
Experimental Results
WaveMixSR-V2 was validated through extensive experimentation on multiple super-resolution tasks, exhibiting state-of-the-art (SOTA) performance, particularly on the BSD100 dataset. Key results demonstrate:
- For 2× SR, WaveMixSR-V2 achieved a Peak Signal-to-Noise Ratio (PSNR) of 33.12 and a Structural Similarity Index (SSIM) of 0.9326.
- For 4× SR, it demonstrated substantial efficiency with a reduced parameter count (0.7M vs 1.7M for WaveMixSR) and computational requirement (25.6G multi-adds vs 25.8G for WaveMixSR).
Tables provided detailed comparisons:
- WaveMixSR-V2 vs. various state-of-the-art methods, showing superior performance, especially considering its efficient use of resources.
- Latency and throughput analysis further highlighted the improvements, with notably lower training (19.6ms) and inference latency (12.1ms) and significantly higher training (50.8fps) and inference throughput (82.6fps).
WaveMixSR-V2’s architectural optimizations enabled it to outperform its predecessor and contemporary models in both efficacy and resource efficiency, setting new benchmarks in SISR tasks.
Implications and Future Work
The practical implications of WaveMixSR-V2 lie in its enhanced efficiency, which is critical for applications requiring real-time image reconstruction or operating under hardware constraints. The multi-stage design and PixelShuffle operation set a precedent for future architectures aiming to balance performance with resource economy.
Theoretically, the success of WaveMixSR-V2 reinforces the potential of spatial token mixing through wavelet transform combined with efficient upsampling strategies. Future research could extend this work by exploring alternative basis functions for wavelet transforms or integrating more advanced learning techniques within the multi-stage framework.
Given the empirical success, further experimentation with larger datasets, variations in architecture depths, and embedding dimensions could yield models pushing the bounds of SISR even further. Additionally, integrating generative adversarial networks (GANs) with the WaveMixSR-V2 architecture could enhance its capability to recover high-frequency details, as suggested by preliminary experiments.
In summary, WaveMixSR-V2 represents a significant step forward in super-resolution techniques, emphasizing an optimal blend of performance and efficiency. Its advancements promise to inform future developments in both academic research and practical implementations within computational imaging.