Exploiting Human Social Cognition for the Detection of Fake and Fraudulent Faces via Memory Networks

Published 17 Nov 2019 in cs.CV, cs.LG, cs.MM, and stat.ML | (1911.07844v1)

Abstract: Advances in computer vision have brought us to the point where we have the ability to synthesise realistic fake content. Such approaches are seen as a source of disinformation and mistrust, and pose serious concerns to governments around the world. Convolutional Neural Networks (CNNs) demonstrate encouraging results when detecting fake images that arise from the specific type of manipulation they are trained on. However, this success has not transitioned to unseen manipulation types, resulting in a significant gap in the line-of-defense. We propose a Hierarchical Memory Network (HMN) architecture, which is able to successfully detect faked faces by utilising knowledge stored in neural memories as well as visual cues to reason about the perceived face and anticipate its future semantic embeddings. This renders a generalisable face tampering detection framework. Experimental results demonstrate the proposed approach achieves superior performance for fake and fraudulent face detection compared to the state-of-the-art.

Abstract PDF Upgrade to Chat

Citations (15)

View on Semantic Scholar

Summary

The paper introduces a Hierarchical Memory Network (HMN) that mimics human social cognition to accurately detect fake faces.
It employs multi-level attention and future semantic prediction to enhance detection accuracy, even against unseen manipulation techniques.
Experimental results on FaceForensics datasets show HMN outperforms CNNs in robustness and efficiency under varied compression and attack scenarios.

This paper explores novel methods employing hierarchical memory networks (HMN) for detecting fake and fraudulent faces by leveraging human-like social cognition processes. The detection framework aims to overcome the limitations of existing convolutional neural networks (CNNs) and their inability to generalize to multiple face manipulation techniques and unseen attacks.

Hierarchical Memory Network Architecture

The core architecture presented in the paper is the Hierarchical Memory Network (HMN) designed to detect fake faces through a combination of neural memory storage and future face semantic embedding predictions.

Input Module: Visual features are extracted using a pre-trained ResNet model, reshaped, and passed to a bidirectional GRU to map inter-patch relations.
*Hierarchical Attention:
- Input Level Attention: This attention mechanism differentiates critical facial patches to generate a query vector that summarizes the input image for further processing.
- Patch Level Attention: Focuses on comparative analysis between memory-stored patches and face patches in the current input using a similarity-based attention mechanism.
- Memory Read Operation: Employs hierarchical encoding techniques to calculate the significance of stored face images relative to the query before propagating informative outputs.
- Figure 1: Proposed Hierarchical Memory Network (HMN) framework for tampered face detection and future face semantic embedding prediction.

Detection and Prediction Mechanisms

Future Semantic Prediction: This network element learns to anticipate future face embeddings, enforcing learning of social context and semantically coherent future face states.
Adversarial Loss: Utilizes a discriminator to assist the generator, differentiating between real and synthesized future face semantics, enhancing embedding quality through adversarial learning.
Figure 2: Hierarchical Memory Module: Memory at time $t-1$ holds $L$ image embeddings, each containing $K$ image patches, demonstrating dual-level attention mechanisms.

Experimental Validation

Experiments on large datasets, including FaceForensics and FaceForensics++ showed HMN's superiority. Results demonstrated high accuracy in detecting false faces, including unseen face synthetic types like DeepFake and FaceSwap, surpassing state-of-the-art CNN based methods dramatically.

Effectiveness on Compression: Tests revealed HMN achieved consistency in performance across varying video compression levels, illustrating the resiliency of hierarchical attention over other approaches.
Efficiency in Unknown Attack Detection: HMN showed significantly higher accuracies than prior models when exposed to novel face manipulation techniques, affirming its robust generalization capability.

Figure 3: Hyper-parameter evaluation: Shows effect of memory length, number of patches, and training dataset size against detection accuracy.

Implications and Future Work

This research contributes a crucial advancement for multimedia forensics, particularly in detecting deceptive digital content. The integration of human-inspired cognition models with deep learning frameworks heralds a new direction, not only in biometric security but also in broader AI applications requiring long-term dependency capture, such as speech recognition and complex trajectory analyses. Future work may explore optimizing computational efficiency and extending applications into real-time scenario-based systems.

Conclusion

The Hierarchical Memory Network framework proposed in this paper exemplifies a significant step forward in fake face detection, leveraging concepts from cognitive science combined with novel network architectures. By successfully incorporating human-like reasoning and prediction capabilities, HMN emerges as a robust, transferable methodology able to reliably address challenges posed by sophisticated face manipulation technologies.

Markdown Report Issue