High-Resolution Photorealistic Image Translation in Real-Time: A Laplacian Pyramid Translation Network

Published 19 May 2021 in cs.CV | (2105.09188v1)

Abstract: Existing image-to-image translation (I2IT) methods are either constrained to low-resolution images or long inference time due to their heavy computational burden on the convolution of high-resolution feature maps. In this paper, we focus on speeding-up the high-resolution photorealistic I2IT tasks based on closed-form Laplacian pyramid decomposition and reconstruction. Specifically, we reveal that the attribute transformations, such as illumination and color manipulation, relate more to the low-frequency component, while the content details can be adaptively refined on high-frequency components. We consequently propose a Laplacian Pyramid Translation Network (LPTN) to simultaneously perform these two tasks, where we design a lightweight network for translating the low-frequency component with reduced resolution and a progressive masking strategy to efficiently refine the high-frequency ones. Our model avoids most of the heavy computation consumed by processing high-resolution feature maps and faithfully preserves the image details. Extensive experimental results on various tasks demonstrate that the proposed method can translate 4K images in real-time using one normal GPU while achieving comparable transformation performance against existing methods. Datasets and codes are available: https://github.com/csjliang/LPTN.

Abstract PDF Upgrade to Chat

Citations (93)

View on Semantic Scholar

Summary

The paper introduces LPTN, which decomposes images into frequency bands to achieve real-time 4K translation with a PSNR over 22.
It employs a low-frequency translation and high-frequency refinement strategy to balance computational efficiency with detail preservation.
The approach uses unsupervised adversarial training, enabling realistic image transformations without the need for paired datasets.

High-Resolution Photorealistic Image Translation in Real-Time: A Laplacian Pyramid Translation Network

This paper presents a novel approach for real-time photorealistic image-to-image translation (I2IT) focused on efficient processing of high-resolution images. The authors introduce the Laplacian Pyramid Translation Network (LPTN) to address challenges in existing I2IT methods that often struggle with high computational requirements and long inference times.

Methodology

The proposed LPTN leverages the Laplacian pyramid to decompose images into different frequency bands, balancing computational efficiency with effective translation of domain-specific attributes. By focusing on low-frequency components for transformations such as illumination and color changes, the LPTN maintains the resolution and detail fidelity in high-frequency components through an adaptive refinement process.

Key innovations include:

Low-Frequency Translation: The system focuses computational resources on translating low-frequency components, which carry crucial information about global visual attributes. This translation is performed using a lightweight network with residual blocks.
High-Frequency Refinement: The paper describes a progressive masking strategy where a small network computes masks on lower-resolution high-frequency components. These masks are then progressively refined and upsampled to higher resolution components, maintaining texture details without intensive computation.
Unsupervised Training: The LPTN employs an end-to-end unsupervised training strategy using adversarial training frameworks to ensure realistic translation without paired datasets.

Results

Experimental results demonstrate that the LPTN provides real-time performance on 4K images using standard GPUs while maintaining competitive photorealism in the translated images. Tasks such as day-to-night transition or summer-to-winter transformations were performed effectively without introducing distortions frequently observed in competing solutions.

Quantitative Performance: The technique achieves a PSNR of over 22 on photorealistic retouching tasks, notably higher than many contemporary methods.
Efficiency: The process runtime scales linearly with image size, ensuring feasibility for high-resolution applications, contrasting prior approaches that exhibit exponential growth in computational demands.

Implications and Future Work

The proposed LPTN architecture offers significant implications for various applications that require real-time image processing at high resolutions, such as video post-production, augmented reality, and autonomous driving systems. Future research could explore extensions of the framework to tackle more complex transformations or integrate with other AI-driven content generation pipelines.

Further investigation into optimizing the balance between frequency domain decomposition and detailed texture reconstruction could enhance performance, potentially addressing current limitations related to novel detail synthesis.

In summary, the LPTN offers a promising step towards efficient, high-quality photorealistic image translation. Its ability to handle 4K resolution tasks in real-time without sacrificing detail quality sets a foundation for advancements in real-world AI applications requiring instantaneous image transformations.