Papers
Topics
Authors
Recent
Search
2000 character limit reached

WaveMixSR-V2: Enhancing Super-resolution with Higher Efficiency

Published 16 Sep 2024 in eess.IV, cs.AI, cs.CV, and cs.LG | (2409.10582v3)

Abstract: Recent advancements in single image super-resolution have been predominantly driven by token mixers and transformer architectures. WaveMixSR utilized the WaveMix architecture, employing a two-dimensional discrete wavelet transform for spatial token mixing, achieving superior performance in super-resolution tasks with remarkable resource efficiency. In this work, we present an enhanced version of the WaveMixSR architecture by (1) replacing the traditional transpose convolution layer with a pixel shuffle operation and (2) implementing a multistage design for higher resolution tasks ($4\times$). Our experiments demonstrate that our enhanced model -- WaveMixSR-V2 -- outperforms other architectures in multiple super-resolution tasks, achieving state-of-the-art for the BSD100 dataset, while also consuming fewer resources, exhibits higher parameter efficiency, lower latency and higher throughput. Our code is available at https://github.com/pranavphoenix/WaveMixSR.

Summary

  • The paper introduces a multi-stage design for super-resolution that incrementally refines image details for enhanced performance.
  • It replaces transposed convolution with an efficient PixelShuffle operation, lowering parameter count and computational cost while avoiding artifacts.
  • Experimental results on BSD100 demonstrate state-of-the-art performance with reduced latency and higher throughput.

WaveMixSR-V2: Enhancing Super-Resolution with Higher Efficiency

Introduction

Single-image super-resolution (SISR) remains a significant task within image reconstruction, aiming to enhance the quality of images by converting low-resolution (LR) images into high-resolution (HR) counterparts. This involves predicting and reinstating the detailed information lost in the lower resolution. Recent technological developments in this field have seen notable contributions from token mixers and transformer architectures. Specifically, attention-based transformers like SwinFIR and hybrid attention transformers have made substantial progress by better capturing long-range dependencies. However, transformers have the inherent challenge of quadratic complexity in self-attention, leading to high resource demand alongside the requirement for extensive datasets.

Background

WaveMixSR, based on the WaveMix architecture, addressed some of these challenges by employing a two-dimensional discrete wavelet transform for spatial token mixing. This allowed the model to achieve superior super-resolution while being resource efficient. The current paper presents an enhanced version of WaveMixSR, termed WaveMixSR-V2, which integrates two critical improvements:

  1. Replacing the traditional transpose convolution layer with a PixelShuffle operation.
  2. Implementing a multi-stage design for higher super-resolution tasks (specifically for 4×4\times SR).

Architectural Enhancements

Multi-Stage Design

The original WaveMixSR model relied on a single-stage design, which resized the LR image directly to HR using a non-parametric upsampling layer, such as bilinear or bicubic interpolation. This approach constrained the model’s ability to fine-tune and optimize across different scales efficiently. WaveMixSR-V2 overcomes this by introducing a multi-stage design comprising sequential resolution-doubling 2×2\times SR blocks. For instance, a task requiring 4×4\times super-resolution now progresses through a series of two 2×2\times SR blocks. This staged approach enhances the model's capacity to refine details incrementally, thereby leading to improved performance with reduced resource usage.

PixelShuffle

A significant modification in WaveMixSR-V2 is the substitution of the transposed convolution operation with a PixelShuffle operation followed by a convolution layer. Where transposed convolutions tend to involve numerous parameters and high computational cost, PixelShuffle rearranges pixels from feature maps more efficiently. Concomitantly, the subsequent convolution layer continues feature refinement. This approach diminishes parameter count and computational expense while avoiding checkerboard artifacts typically introduced by transposed convolutions, resulting in smoother and more natural-looking images.

Experimental Results

WaveMixSR-V2 was validated through extensive experimentation on multiple super-resolution tasks, exhibiting state-of-the-art (SOTA) performance, particularly on the BSD100 dataset. Key results demonstrate:

  • For 2×2\times SR, WaveMixSR-V2 achieved a Peak Signal-to-Noise Ratio (PSNR) of 33.12 and a Structural Similarity Index (SSIM) of 0.9326.
  • For 4×4\times SR, it demonstrated substantial efficiency with a reduced parameter count (0.7M vs 1.7M for WaveMixSR) and computational requirement (25.6G multi-adds vs 25.8G for WaveMixSR).

Tables provided detailed comparisons:

  • WaveMixSR-V2 vs. various state-of-the-art methods, showing superior performance, especially considering its efficient use of resources.
  • Latency and throughput analysis further highlighted the improvements, with notably lower training (19.6ms) and inference latency (12.1ms) and significantly higher training (50.8fps) and inference throughput (82.6fps).

WaveMixSR-V2’s architectural optimizations enabled it to outperform its predecessor and contemporary models in both efficacy and resource efficiency, setting new benchmarks in SISR tasks.

Implications and Future Work

The practical implications of WaveMixSR-V2 lie in its enhanced efficiency, which is critical for applications requiring real-time image reconstruction or operating under hardware constraints. The multi-stage design and PixelShuffle operation set a precedent for future architectures aiming to balance performance with resource economy.

Theoretically, the success of WaveMixSR-V2 reinforces the potential of spatial token mixing through wavelet transform combined with efficient upsampling strategies. Future research could extend this work by exploring alternative basis functions for wavelet transforms or integrating more advanced learning techniques within the multi-stage framework.

Given the empirical success, further experimentation with larger datasets, variations in architecture depths, and embedding dimensions could yield models pushing the bounds of SISR even further. Additionally, integrating generative adversarial networks (GANs) with the WaveMixSR-V2 architecture could enhance its capability to recover high-frequency details, as suggested by preliminary experiments.

In summary, WaveMixSR-V2 represents a significant step forward in super-resolution techniques, emphasizing an optimal blend of performance and efficiency. Its advancements promise to inform future developments in both academic research and practical implementations within computational imaging.

Paper to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 1 tweet with 2 likes about this paper.