Person Recognition in Personal Photo Collections

Published 11 Sep 2015 in cs.CV | (1509.03502v2)

Abstract: Recognising persons in everyday photos presents major challenges (occluded faces, different clothing, locations, etc.) for machine vision. We propose a convnet based person recognition system on which we provide an in-depth analysis of informativeness of different body cues, impact of training data, and the common failure modes of the system. In addition, we discuss the limitations of existing benchmarks and propose more challenging ones. Our method is simple and is built on open source and open data, yet it improves the state of the art results on a large dataset of social media photos (PIPA).

Abstract PDF Upgrade to Chat

Citations (52)

View on Semantic Scholar

Summary

The paper introduces a convnet system that integrates head, upper body, and scene context cues to enhance recognition accuracy.
It evaluates the informativeness of different image regions using the PIPA dataset under rigorous and realistic protocols.
The study provides open-source techniques that improve reproducibility and set new benchmarks for person recognition in diverse photo collections.

Person Recognition in Personal Photo Collections

The paper "Person Recognition in Personal Photo Collections" addresses the complex challenges associated with identifying individuals in varied and informal environments, as found in personal photo collections, distinct from controlled or uniform environments typical of facial recognition tasks. This research contributes to machine vision by developing a convolutional network (convnet) based system, analyzing its performance across multiple cues, presenting benchmark limitations, and suggesting more robust protocols.

Overview of Contributions

The authors thoroughly dissect the process of recognizing individuals when their faces are occluded or when they are depicted in diverse settings such as different backgrounds, clothing, and postures. Notably, they identify three complementary sources of information to improve recognition accuracy: the body region for shape and appearance, human attributes (e.g., age and gender), and scene context to minimize ambiguity in identification.

Cues Analysis: The paper meticulously examines several image regions — face (f), head (h), upper body (u), body (b), and scene (s) — to determine their relative informativeness. This analysis reveals that while the head region is most informative, different regions complement each other in terms of recognition accuracy.
Experimental Protocols: It proposes new experimental protocols using the PIPA dataset that are more challenging and realistic than existing benchmarks, fostering a robust evaluation of cue effectiveness. PIPA, significantly larger than previous datasets, is leveraged to provide comprehensive identity annotations, even for non-visible faces.
Open Source Techniques: The method is constructed using open-source code, enhancing reproducibility and accessibility. The models trained on this platform display superior performance on public datasets without requiring specialized hardware or data.

Technical Insights

Through rigorous experimentation, the paper illustrates that convnets, when applied to varied cues such as body attributes and scene context, significantly surpass traditional facial recognition systems, especially when faces are obscured or non-frontal. The findings challenge the prevailing reliance on facial features alone, suggesting increased reliability when incorporating body and contextual data.

In particular, the authors attain the best results on the PIPA dataset without resorting to specialized face recognition or pose estimation techniques. Their results indicate substantial complementarity between different cues, disclosing that head and upper body regions provide the strongest individual cues, yet a combination of all cues yields the best results.

Future Directions

The insights from this paper pave the way for future research in AI-driven person recognition. Recognizing individuals in personal photo collections can be enhanced by integrating additional data sources, performing more robust feature learning, and potentially leveraging the socio-temporal context of images, such as the social network or events captured within albums. Further, as the dataset and models are publicly available, they invite wide-ranging applications and improvements upon the proposed methodologies.

Conclusion

This research presents valuable advancements in understanding person recognition when faces are not fully visible, a frequent real-world scenario in personal photo collections. Moving forward, researchers might focus on expanding datasets to include wider identity spectrum data, refining attribute-based recognition models, and enhancing methodologies to handle open-world conditions, where head annotations are not provided. The findings underscore the importance of holistic approaches in operationalizing AI systems, especially within the dynamic contexts of photo albums.

Markdown Report Issue