Live Face De-Identification in Video

Published 19 Nov 2019 in cs.LG, cs.CV, cs.GR, and stat.ML | (1911.08348v1)

Abstract: We propose a method for face de-identification that enables fully automatic video modification at high frame rates. The goal is to maximally decorrelate the identity, while having the perception (pose, illumination and expression) fixed. We achieve this by a novel feed-forward encoder-decoder network architecture that is conditioned on the high-level representation of a person's facial image. The network is global, in the sense that it does not need to be retrained for a given video or for a given identity, and it creates natural looking image sequences with little distortion in time.

Abstract PDF Upgrade to Chat

Citations (125)

View on Semantic Scholar

Summary

An Overview of Live Face De-Identification in Video

The paper "Live Face De-Identification in Video" by Oran Gafni, Lior Wolf, and Yaniv Taigman presents a novel approach for face de-identification in video streams. This method addresses the need for privacy-preserving techniques in the context of increasing concerns related to facial recognition technologies and their implications on personal privacy. The proposed solution offers a sophisticated technique to modify video content by retaining essential perceptual aspects while obfuscating identifiable features.

Methodological Foundation

The authors introduce a feed-forward encoder-decoder network that is designed to ensure identity decorrelation while maintaining pose, illumination, and expression. Unlike many existing methods that swap faces with predefined dataset identities, this approach generates entirely new facial identities in a global manner, without the necessity for retraining on new video content or specific identities. The architecture leverages a high-level facial representation to guide the generation of de-identified visuals.

The paper outlines several technical innovations:

Encoder-Decoder Architecture: The network employs an encoder-decoder setup where face recognition network activations are incorporated into the latent space. This enables identity transformation without compromising perceptual stability in other facial features.
Attractor-Repeller Losses: A unique perceptual loss function is introduced, dividing features into low and mid-level (tied to the input) and high-level (identity differential) categories. This innovative loss function ensures the video appears consistent while adequately modifying the identity.
Mask Processing for Output Synthesis: The output includes a face image and an accompanying mask, facilitating precise control over the areas impacted by identity changes. This dual-output mechanism aids in blending the modified face seamlessly into the original video frame.

Evaluation and Comparison

The authors comprehensively evaluate their method against existing literature, showcasing its strengths in generating natural-looking video sequences from unconstrained inputs. The proposed method is notably assessed against recent de-identification techniques such as GAN-based methods for identity protection and person obfuscation, where it consistently outperforms in terms of preserving video integrity and identity anonymity.

Experimentally, the authors demonstrate the method's resistance to contemporary face recognition algorithms. They employ user studies to gauge the fidelity of the modified videos, confirming that the de-identification process is perceptually unobtrusive to human observers. Metric evaluations using face recognition networks further reinforce the effectiveness of the identity transformation.

Implications and Future Directions

This research sits at the intersection of privacy enhancement and machine vision, offering a substantial contribution to ethical video processing practices. Practically, the method could be adopted in public and consumer applications where facial recognition poses privacy risks. Theoretically, this architecture could inspire further exploration into novel loss functions and architectures that balance identity masking with perceptual consistency.

Looking forward, advancements in artificial intelligence and neural network design could extend this work. Future research might explore fine-grained control over de-identification levels, adapting the technique for various privacy standards or ethical considerations. Moreover, real-time applications demand continued refinement of processing speeds and computational efficiency, paving the way for widespread utilization in dynamic environments.

In conclusion, this paper presents a technically adept solution for live video face de-identification, emphasizing both privacy protection and the preservation of video naturalness. The proposed approach distinguishes itself through its novel architectural and methodological choices, contributing valuable knowledge to the fields of computer vision and privacy-preserving technologies.

Markdown Report Issue