ITSRN++: Stronger and Better Implicit Transformer Network for Continuous Screen Content Image Super-Resolution

Published 17 Oct 2022 in cs.CV and cs.MM | (2210.08812v1)

Abstract: Nowadays, online screen sharing and remote cooperation are becoming ubiquitous. However, the screen content may be downsampled and compressed during transmission, while it may be displayed on large screens or the users would zoom in for detail observation at the receiver side. Therefore, developing a strong and effective screen content image (SCI) super-resolution (SR) method is demanded. We observe that the weight-sharing upsampler (such as deconvolution or pixel shuffle) could be harmful to sharp and thin edges in SCIs, and the fixed scale upsampler makes it inflexible to fit screens with various sizes. To solve this problem, we propose an implicit transformer network for continuous SCI SR (termed as ITSRN++). Specifically, we propose a modulation based transformer as the upsampler, which modulates the pixel features in discrete space via a periodic nonlinear function to generate features for continuous pixels. To enhance the extracted features, we further propose an enhanced transformer as the feature extraction backbone, where convolution and attention branches are utilized parallelly. Besides, we construct a large scale SCI2K dataset to facilitate the research on SCI SR. Experimental results on nine datasets demonstrate that the proposed method achieves state-of-the-art performance for SCI SR (outperforming SwinIR by 0.74 dB for x3 SR) and also works well for natural image SR. Our codes and dataset will be released upon the acceptance of this work.

Abstract PDF Upgrade to Chat

Citations (3)

View on Semantic Scholar

Summary

The paper presents ITSRN++, which redefines upsampling using an implicit transformer to generate continuous pixel features from screen content images.
The model integrates a dual branch block that concurrently captures high-frequency details via convolution and low-frequency patterns via attention.
Evaluation on the SCI2K dataset shows ITSRN++ outperforms existing methods in PSNR and SSIM, while maintaining computational efficiency.

"ITSRN++: Stronger and Better Implicit Transformer Network for Continuous Screen Content Image Super-Resolution" (2210.08812)

Introduction

The paper introduces ITSRN++, an advanced model designed for continuous Screen Content Image Super-Resolution (SCI SR). Addressing the growing trends of remote collaboration and online education, where screen content needs to be magnified beyond its original resolution, ITSRN++ provides a more flexible and sharper solution compared to traditional upsampling methods. It leverages a novel implicit transformer architecture for continuous upsampling and improved feature extraction through an enhanced transformer network.

Implicit Transformer for Upsampling

The proposed implicit transformer based upsampler in ITSRN++ is a notable departure from fixed and integer-scale upsampling methods like deconvolution and pixel-shuffle. By introducing a modulation-based approach, the upsampler generates pixel features in continuous space using a periodic nonlinear function. Specifically, it redefines the upsampling problem into three conceptual steps: coordinate projection, weight generation, and aggregation, each harmonizing with the transformer model's processes.

This method enhances flexibility allowing arbitrary magnification ratios while maintaining image sharpness—enabling seamless integration of high-frequency components essential for screen content that predominantly comprises text and graphics.

Figure 1: Visual comparison of the proposed ITSRN++ with state-of-the-art continuous magnification methods. With continuous upsamplers, images can be magnified with arbitrary ratios.

Enhanced Transformer-Based Feature Extraction

To bolster feature extraction capabilities, ITSRN++ introduces a parallel structure combining convolution and attention mechanisms in its dual branch block (DBB). Unlike sequential stacking that only modulates either local or global features per layer, the parallel setup captures both simultaneously. The convolution branch addresses high-frequency components, while the attention branch focuses on low-frequency patterns in screen content, optimizing the retention of sharp edges and repetitive patterns.

This strategic hybridization outperforms traditional dense and channel attention-based networks by creating richer high-trust operational space within the transformer model, significantly improving SCI SR quality.

Figure 2: Illustration of aggregation based explicit transformer and modulation based implicit transformer.

Dataset and Performance Evaluation

ITSRN++'s effectiveness is reinforced with the introduction of the SCI2K dataset, a comprehensive screen content database featuring 2K resolution images. This dataset facilitates extensive benchmarking and has revealed ITSRN++ to excel in both PSNR and SSIM metrics across various SCI SR benchmark datasets, outperforming previously competitive models like SwinIR and RCAN in sharpness and edge retention.

Moreover, the model's architecture ensures it remains computationally tractable, optimizing feature extraction and upsampler efficiency.

Figure 3: The proposed implicit transformer based upsampler, which can generate pixel values in continuous space. The orange coordinates are in HR space and the blue coordinates are in LR space.

Conclusion

ITSRN++ sets a new paradigm in SCI SR through innovative implicit transform techniques and enhanced feature extraction methodologies. The work exemplifies the integration of advanced neural network architectures to adapt to specific content needs, providing a versatile yet powerful toolset for high-resolution screen content analysis. Future directions may involve optimizing computational load, aiming for real-time applications without compromising the upsampling quality and resolution flexibility.