Real-World Blind Super-Resolution via Feature Matching with Implicit High-Resolution Priors

Published 26 Feb 2022 in cs.CV | (2202.13142v2)

Abstract: A key challenge of real-world image super-resolution (SR) is to recover the missing details in low-resolution (LR) images with complex unknown degradations (e.g., downsampling, noise and compression). Most previous works restore such missing details in the image space. To cope with the high diversity of natural images, they either rely on the unstable GANs that are difficult to train and prone to artifacts, or resort to explicit references from high-resolution (HR) images that are usually unavailable. In this work, we propose Feature Matching SR (FeMaSR), which restores realistic HR images in a much more compact feature space. Unlike image-space methods, our FeMaSR restores HR images by matching distorted LR image {\it features} to their distortion-free HR counterparts in our pretrained HR priors, and decoding the matched features to obtain realistic HR images. Specifically, our HR priors contain a discrete feature codebook and its associated decoder, which are pretrained on HR images with a Vector Quantized Generative Adversarial Network (VQGAN). Notably, we incorporate a novel semantic regularization in VQGAN to improve the quality of reconstructed images. For the feature matching, we first extract LR features with an LR encoder consisting of several Swin Transformer blocks and then follow a simple nearest neighbour strategy to match them with the pretrained codebook. In particular, we equip the LR encoder with residual shortcut connections to the decoder, which is critical to the optimization of feature matching loss and also helps to complement the possible feature matching errors. Experimental results show that our approach produces more realistic HR images than previous methods. Codes are released at \url{https://github.com/chaofengc/FeMaSR}.

Abstract PDF Upgrade to Chat

Citations (88)

View on Semantic Scholar

Summary

The paper introduces FeMaSR, a framework that transforms super-resolution into a feature matching task using implicitly learned high-resolution priors via a VQGAN.
The method employs a Swin Transformer encoder with semantic regularization to enhance feature correspondence and recover detailed textures.
Experimental results show that FeMaSR outperforms GAN-based approaches in perceptual quality, achieving lower LPIPS scores and realistic image restoration.

The paper under discussion addresses the complex challenge of single image super-resolution (SISR) in real-world scenarios, where low-resolution (LR) images are plagued with unknown degradations, such as downsampling, noise, and compression artifacts. The study proposes a novel solution through Feature Matching Super-Resolution (FeMaSR), which utilizes high-resolution (HR) priors encoded in a compact feature space, diverging from conventional image-space restoration approaches.

Methodological Contributions

The FeMaSR framework essentially transforms the super-resolution problem into a task of feature matching within a learned feature space, underpinned by a pretrained Vector Quantized Generative Adversarial Network (VQGAN) that provides the HR priors. The approach consists of two primary stages:

Pretraining of High-Resolution Priors (HRP): The HR priors are constructed via a VQGAN, which learns a discrete codebook representing high-quality feature vectors from HR images. This codebook, complemented by an associated decoder, encapsulates the essential information to reconstruct detailed textures.
Super-Resolution via Feature Matching: To convert LR images to their HR counterparts, the method involves matching LR features extracted via a Swin Transformer-based encoder to the HR feature vectors in the codebook. This process is facilitated by nearest neighbor matching and enhanced by semantic regularization, which integrates semantic features into the VQGAN training framework.

Technical Innovations

Semantic Regularization in VQGAN: The introduction of semantic regularization aims to strengthen the relationship between semantic context and high-quality textures during the pretraining of HR priors. This innovation enhances the overall realism and fidelity of the reconstructed images.
Residual Shortcut Connections: These connections link the LR feature space directly to the output decoder, significantly easing the optimization process and improving the recovery of realistic textures by providing gradient shortcuts and compensating for potential feature matching discrepancies.

Experimental Validation

The experimental section of the paper presents robust evidence of FeMaSR's effectiveness. By leveraging the HR features encoded in the pretrained codebook, FeMaSR outperforms contemporary GAN-based methods on both synthetic and real-world datasets. Specifically, FeMaSR achieves superior results in perceptual quality as measured by LPIPS scores, while effectively mitigating artifacts and producing images with realistic textures. The practical implementation of these findings is buttressed by the open-sourcing of the method, which can accelerate further research and development in blind super-resolution.

Implications and Future Directions

The implications of this work are profound, both for theoretical advancements and practical applications. The method’s reliance on implicit priors rather than explicit image-space references suggests potential for scaling towards more generalized applications across diverse data domains. Furthermore, by shifting the burden of texture restoration from explicit GAN synthesis to a feature matching framework, this study paves the way for future explorations into more stable and artifact-free image synthesis and enhancement techniques.

Future research might exploit this framework to explore larger and more diverse training datasets or integrate more sophisticated semantic regularizers, potentially enhancing performance further. Moreover, expanding the current method to cater to video super-resolution or other image restoration tasks might offer promising new directions.

In conclusion, FeMaSR represents a significant evolution in the domain of real-world image super-resolution, employing sophisticated high-resolution priors to achieve precise and reliable image restoration, even under challenging conditions of unknown degradation. This advancement not only enriches the field theoretically but holds substantial promise for diverse real-world applications.

Markdown Report Issue