Fine-Grained Face Swapping via Regional GAN Inversion

Published 25 Nov 2022 in cs.CV | (2211.14068v2)

Abstract: We present a novel paradigm for high-fidelity face swapping that faithfully preserves the desired subtle geometry and texture details. We rethink face swapping from the perspective of fine-grained face editing, \textit{i.e., ``editing for swapping'' (E4S)}, and propose a framework that is based on the explicit disentanglement of the shape and texture of facial components. Following the E4S principle, our framework enables both global and local swapping of facial features, as well as controlling the amount of partial swapping specified by the user. Furthermore, the E4S paradigm is inherently capable of handling facial occlusions by means of facial masks. At the core of our system lies a novel Regional GAN Inversion (RGI) method, which allows the explicit disentanglement of shape and texture. It also allows face swapping to be performed in the latent space of StyleGAN. Specifically, we design a multi-scale mask-guided encoder to project the texture of each facial component into regional style codes. We also design a mask-guided injection module to manipulate the feature maps with the style codes. Based on the disentanglement, face swapping is reformulated as a simplified problem of style and mask swapping. Extensive experiments and comparisons with current state-of-the-art methods demonstrate the superiority of our approach in preserving texture and shape details, as well as working with high resolution images. The project page is http://e4s2022.github.io

Abstract PDF Upgrade to Chat

Citations (42)

View on Semantic Scholar

Summary

The paper introduces the Editing-for-Swapping (E4S) framework that uses regional GAN inversion to disentangle shape and texture, enhancing identity preservation.
It employs mask-guided multi-scale encoding into a tailored latent space for precise regional control during face swapping.
Extensive experiments demonstrate improved fidelity and pose/expression consistency over prior methods like FSGAN, MegaFS, and FaceShifter.

Fine-Grained Face Swapping via Regional GAN Inversion

The paper introduces a sophisticated approach to face swapping titled "Fine-Grained Face Swapping via Regional GAN Inversion." The authors propose a novel framework termed Editing-for-Swapping (E4S) that employs a Regional GAN Inversion (RGI) method. This approach aims to disentangle shape and texture separately to enhance identity preservation and image fidelity in face swapping tasks. Utilizing a pre-trained StyleGAN, the method innovatively leverages mask-guided encoding for local feature extraction, thereby transforming face swapping into a problem of style and mask swapping.

Methodology

The methodology is divided into two key phases: Reenactment and Swapping with Generation. Initially, face reenactment realigns source images to match the pose and expression of target images. This preparation uses FaceVid2Vid for pose and expression normalization, ensuring consistent alignment across the swapping process.

In the swapping phase, the proposed RGI method is employed. Here, extracted style codes, representing the texture of facial components, are generated through a mask-guided multi-scale encoder. These codes reside in a newly introduced latent space, $\mathcal{W}^{r+}$ , which is tailored for localized editing within StyleGAN's framework. The generator, equipped with a mask-guided injection module, synthesizes high-resolution output images by manipulating these style codes according to recomposed facial masks.

Results and Implications

The authors conducted extensive experiments demonstrating that their approach surpasses current state-of-the-art methods in maintaining identity and fidelity when swapping faces. The framework effectively handles varying occlusions and preserves unique source attributes, such as skin tone, more accurately than previous models like FSGAN, MegaFS, and FaceShifter. This fine-grained editing capability highlights the model's proficiency in disentangling composite facial features.

Quantitatively, evaluations using identity preservation metrics and pose/expression consistency reveal significant improvements. The RGI method's incorporation of regional style codes enhances the input-output fidelity, facilitating more flexible and granular manipulations, including face editing tasks beyond mere swapping.

Impact and Future Directions

This research offers both theoretical and practical advancements. Theoretically, it advances understanding in face editing by restructuring the problem around component-level editing using disentangled features. Practically, the implications for industries reliant on high-fidelity face swapping, such as entertainment and augmented reality, are substantial, reducing artifact prevalence and enhancing personalization.

Future research might explore refining the disentanglement of light and texture, potentially incorporating dynamic lighting conditions into the existing framework. Additionally, exploring adversarial robustness and ethical dimensions in synthetic face generation could align technical developments with broader societal needs.

This paper lays a strong foundation for subsequent innovations in high-resolution, high-fidelity face synthesis, steering the research community toward a granular approach that respects the complexities of human facial characteristics.