CrossNet: An End-to-end Reference-based Super Resolution Network using Cross-scale Warping

Published 27 Jul 2018 in cs.CV | (1807.10547v1)

Abstract: The Reference-based Super-resolution (RefSR) super-resolves a low-resolution (LR) image given an external high-resolution (HR) reference image, where the reference image and LR image share similar viewpoint but with significant resolution gap x8. Existing RefSR methods work in a cascaded way such as patch matching followed by synthesis pipeline with two independently defined objective functions, leading to the inter-patch misalignment, grid effect and inefficient optimization. To resolve these issues, we present CrossNet, an end-to-end and fully-convolutional deep neural network using cross-scale warping. Our network contains image encoders, cross-scale warping layers, and fusion decoder: the encoder serves to extract multi-scale features from both the LR and the reference images; the cross-scale warping layers spatially aligns the reference feature map with the LR feature map; the decoder finally aggregates feature maps from both domains to synthesize the HR output. Using cross-scale warping, our network is able to perform spatial alignment at pixel-level in an end-to-end fashion, which improves the existing schemes both in precision (around 2dB-4dB) and efficiency (more than 100 times faster).

Abstract PDF Upgrade to Chat

Citations (179)

View on Semantic Scholar

Summary

Paper Review: CrossNet: An End-to-end Reference-based Super Resolution Network using Cross-scale Warping

The paper introduces CrossNet, a novel implementation in the domain of Reference-based Super-Resolution (RefSR). This area of image processing focuses on enhancing the resolution of a low-resolution (LR) image by utilizing a high-resolution (HR) reference image. The CrossNet framework seeks to resolve challenges inherent in prior methodologies that are characterized by patch matching followed by synthesis. Such traditional approaches can result in inter-patch misalignment, grid effects, and inefficient optimization.

Methodology

CrossNet circumvents these issues by presenting an end-to-end framework through cross-scale warping. The network is structured using image encoders, cross-scale warping layers, and a fusion decoder. This architecture enables spatial alignment of reference feature maps with LR feature maps and culminates in HR output synthesis. The cross-scale warping technique integrates a more precise, pixel-level alignment approach, improving upon former methods in both precision (achieving approximately 2dB-4dB in gains) and efficiency (notably faster by more than 100 times).

Experimental Setup

The authors validate their approach with comprehensive experiments across different datasets, including Flower and LFVideo datasets. This involves evaluating the performance of the CrossNet model against established SISR approaches such as SRCNN, VDSR, and MDSR, and other RefSR methodologies like SS-Net and PatchMatch. The analyses demonstrate superior performance, corroborated by substantial gains in metrics like PSNR, SSIM, and IFC, attesting to the robustness of CrossNet under variations in viewpoint disparity and upscaling factors.

Implications and Future Developments

The implications of this research are extensive in both practical and theoretical spheres. Practically, CrossNet's expedited processing capability and higher precision make it suitable for real-time applications in diverse fields, from light-field reconstruction to high-definition video processing. Theoretically, this offers a novel exploration into integrating multi-scale warping directly in the feature domain, potentially directing future research in advanced image synthesis techniques. The ability to efficiently model non-rigid image transformations represents a notable step forward in dealing with inter-image parallax and resolution gaps.

In conclusion, CrossNet stands as an efficient and powerful model within the realm of image super-resolution, setting a precedent for future investigations into reference-based image processing. Its innovative approach towards alignment and synthesis can serve as a foundation for subsequent explorations in enhancing image fidelity in complex visual settings.