From Facial Expression Recognition to Interpersonal Relation Prediction

Published 21 Sep 2016 in cs.CV | (1609.06426v3)

Abstract: Interpersonal relation defines the association, e.g., warm, friendliness, and dominance, between two or more people. Motivated by psychological studies, we investigate if such fine-grained and high-level relation traits can be characterized and quantified from face images in the wild. We address this challenging problem by first studying a deep network architecture for robust recognition of facial expressions. Unlike existing models that typically learn from facial expression labels alone, we devise an effective multitask network that is capable of learning from rich auxiliary attributes such as gender, age, and head pose, beyond just facial expression data. While conventional supervised training requires datasets with complete labels (e.g., all samples must be labeled with gender, age, and expression), we show that this requirement can be relaxed via a novel attribute propagation method. The approach further allows us to leverage the inherent correspondences between heterogeneous attribute sources despite the disparate distributions of different datasets. With the network we demonstrate state-of-the-art results on existing facial expression recognition benchmarks. To predict inter-personal relation, we use the expression recognition network as branches for a Siamese model. Extensive experiments show that our model is capable of mining mutual context of faces for accurate fine-grained interpersonal prediction.

Abstract PDF Upgrade to Chat

Authors (4)

Citations (254)

View on Semantic Scholar

Summary

The paper introduces a multitask deep convolutional network with an attribute propagation technique to handle missing labels, enhancing relational prediction.
It demonstrates state-of-the-art performance on datasets like ExpW, SFEW, and CK+, validating its robustness in natural, diverse settings.
The approach utilizes a Siamese architecture and spatial context to infer nuanced relational attributes such as dominance, warmth, and trust from facial cues.

Analyzing Interpersonal Relations via Facial Expressions in the Wild

This paper presents a robust framework for predicting interpersonal relations between individuals as captured in natural face images, drawing on a foundation of facial expression recognition. The authors address a nuanced problem whereby relational attributes are discerned from image pairs, extending beyond conventional facial expression classifiers that typically evaluate solitary faces and categorize them into basic expressions.

Technical Approach and Methodologies

The researchers advocate a novel multitask deep convolutional network (DCN) to recognize facial expressions and attributes such as gender, age, and pose from diverse image datasets. A pivotal innovation in their framework is the implementation of an attribute propagation technique, which allows the network to deal with missing attribute labels in multi-source datasets, a common issue that plagues machine learning pipelines when comprehensive labeled data is unavailable. This is accomplished through a Markov Random Field (MRF) to infer missing labels during training, where the network exploits inherent correlations between attributes.

For interpersonal relation prediction, the paper introduces a Siamese network architecture that leverages the trained DCNs as building blocks. This architectural choice allows the model to process pairs of facial images and distill relational attributes such as dominance, warmth, and trust inferred from expressions and auxiliary spatial cues. The use of spatial context such as relative face positions further enriches the relational predictions.

Experimental Results and Implications

The framework demonstrates its effectiveness on a newly introduced Expression-in-the-Wild (ExpW) dataset, composed of over 91,000 manually labeled images, and an interpersonal relation dataset. The authors report state-of-the-art performance on established benchmarks like the Static Facial Expressions in the Wild (SFEW) dataset and the CK+ dataset. Importantly, the use of attribute propagation significantly boosts accuracy compared to models trained without it.

The utility of the proposed approach is far-reaching. It holds promise for a variety of applications ranging from social media analytics to affective computing, where understanding nuanced interpersonal dynamics is essential. By capturing relational subtleties in volatile environments, such as political or social interactions depicted in visual media, it provides a tool for mining social contexts and behaviors more authentically.

Future Directions

The research opens avenues for integrating additional modalities, such as body posture and gesture, which, if available, could be combined with facial cues for richer interpersonal modeling. Future efforts could also focus on incorporating temporal information from video sequences to dynamically infer relational characteristics over time, potentially utilizing novel recurrent architectures.

Overall, this paper contributes a sophisticated, multi-faceted framework for interpreting interpersonal dynamics through the lens of advanced computer vision, marking a significant step forward in the field of social signal processing.

Markdown Report Issue