- The paper introduces a Hierarchical Memory Network (HMN) that mimics human social cognition to accurately detect fake faces.
- It employs multi-level attention and future semantic prediction to enhance detection accuracy, even against unseen manipulation techniques.
- Experimental results on FaceForensics datasets show HMN outperforms CNNs in robustness and efficiency under varied compression and attack scenarios.
Exploiting Human Social Cognition for Detection of Fake Faces
This paper explores novel methods employing hierarchical memory networks (HMN) for detecting fake and fraudulent faces by leveraging human-like social cognition processes. The detection framework aims to overcome the limitations of existing convolutional neural networks (CNNs) and their inability to generalize to multiple face manipulation techniques and unseen attacks.
Hierarchical Memory Network Architecture
The core architecture presented in the paper is the Hierarchical Memory Network (HMN) designed to detect fake faces through a combination of neural memory storage and future face semantic embedding predictions.
- Input Module: Visual features are extracted using a pre-trained ResNet model, reshaped, and passed to a bidirectional GRU to map inter-patch relations.
- *Hierarchical Attention:
Detection and Prediction Mechanisms
Experimental Validation
Experiments on large datasets, including FaceForensics and FaceForensics++ showed HMN's superiority. Results demonstrated high accuracy in detecting false faces, including unseen face synthetic types like DeepFake and FaceSwap, surpassing state-of-the-art CNN based methods dramatically.
- Effectiveness on Compression: Tests revealed HMN achieved consistency in performance across varying video compression levels, illustrating the resiliency of hierarchical attention over other approaches.
- Efficiency in Unknown Attack Detection: HMN showed significantly higher accuracies than prior models when exposed to novel face manipulation techniques, affirming its robust generalization capability.


Figure 3: Hyper-parameter evaluation: Shows effect of memory length, number of patches, and training dataset size against detection accuracy.
Implications and Future Work
This research contributes a crucial advancement for multimedia forensics, particularly in detecting deceptive digital content. The integration of human-inspired cognition models with deep learning frameworks heralds a new direction, not only in biometric security but also in broader AI applications requiring long-term dependency capture, such as speech recognition and complex trajectory analyses. Future work may explore optimizing computational efficiency and extending applications into real-time scenario-based systems.
Conclusion
The Hierarchical Memory Network framework proposed in this paper exemplifies a significant step forward in fake face detection, leveraging concepts from cognitive science combined with novel network architectures. By successfully incorporating human-like reasoning and prediction capabilities, HMN emerges as a robust, transferable methodology able to reliably address challenges posed by sophisticated face manipulation technologies.