MS-Celeb-1M: A Dataset and Benchmark for Large-Scale Face Recognition

Published 27 Jul 2016 in cs.CV | (1607.08221v1)

Abstract: In this paper, we design a benchmark task and provide the associated datasets for recognizing face images and link them to corresponding entity keys in a knowledge base. More specifically, we propose a benchmark task to recognize one million celebrities from their face images, by using all the possibly collected face images of this individual on the web as training data. The rich information provided by the knowledge base helps to conduct disambiguation and improve the recognition accuracy, and contributes to various real-world applications, such as image captioning and news video analysis. Associated with this task, we design and provide concrete measurement set, evaluation protocol, as well as training data. We also present in details our experiment setup and report promising baseline results. Our benchmark task could lead to one of the largest classification problems in computer vision. To the best of our knowledge, our training dataset, which contains 10M images in version 1, is the largest publicly available one in the world.

Abstract PDF Upgrade to Chat

Citations (1,879)

View on Semantic Scholar

Summary

The paper introduces the MS-Celeb-1M dataset, linking millions of celebrity images to unique identity keys for robust face recognition testing.
It employs deep convolutional networks, achieving 44.2% recognition on the hard set at a precision level of 95%.
The dataset addresses identity disambiguation and scale challenges, paving the way for improved real-world face recognition applications.

MS-Celeb-1M: A Dataset and Benchmark for Large-Scale Face Recognition

The paper "MS-Celeb-1M: A Dataset and Benchmark for Large-Scale Face Recognition" by Yandong Guo, Lei Zhang, Yuxiao Hu, Xiaodong He, and Jianfeng Gao, delineates an elaborate framework for evaluating large-scale face recognition systems. The paper introduces the MS-Celeb-1M dataset, which encompasses 1 million celebrity faces, delineating a notable advancement in face recognition tasks at an unprecedented scale.

Overview of the Dataset and Benchmark

The MS-Celeb-1M dataset is presented as the largest publicly available face recognition dataset, with approximately 10 million images of 100,000 top celebrities in version 1. The benchmark task is defined as recognizing one million celebrity faces and linking them to unique entity keys in a knowledge base. This incorporation of a knowledge base, such as Freebase, is pivotal for resolving identity disambiguation, where different individuals may share the same name but have distinct entity keys.

The construction of this benchmark addresses two significant gaps in current face recognition research:

Identity Determination: Existing tasks often focus on finding similar images rather than identifying the person in an image.
Scale: Publicly available datasets are typically much smaller than those used internally by industry giants like Facebook and Google.

Methodology and Properties

The benchmark's properties include:

Face Recognition with Disambiguation: Integrates knowledge bases to link faces with unique entity keys, enhancing accuracy in recognizing the correct individual.
Celebrity Focus: Targets celebrity recognition to leverage extensive publicly available data, making the dataset applicable across various real-world scenarios.
Scale and Diversity: Involving one million celebrities introduces substantial inter- and intra-class variations, posing challenges like recognizing visually similar individuals (e.g., twins) and handling diverse facial appearances across different images.

Measurement and Evaluation Protocol

The evaluation protocol involves concrete measurement sets, including a randomly selected image set and a “hard set” for each celebrity to evaluate the generalization capability of recognition models. The protocol measures both precision and coverage, advocating for high-precision recognition, which is critical for practical applications.

Experimental Setup and Results

The authors trained a convolutional deep neural network on the provided training data, yielding notable results:

Hard Set: 44.2% images recognized at a precision of 95%.
Random Set: Achieved higher coverage, indicative of robust model performance under more typical conditions.

Practical and Theoretical Implications

From a practical perspective, the large-scale dataset and benchmark facilitate the development and evaluation of face recognition systems that can operate effectively in real-world settings, such as in television broadcasting, image search engines, and automated image captioning. The deployment of such models in consumer technology would markedly enhance user experiences with more accurate and contextually aware services.

Theoretically, the introduction of this benchmark invites further exploration into improving recognition models' performance on large datasets with high intra-class variance and the incorporation of sophisticated disambiguation mechanisms. Researchers are encouraged to bring additional data into the model-building process, fostering innovation and improved methodologies in dataset construction, label disambiguation, and model training on noisy data.

Future Directions

Future work could encompass expanding the dataset to include more celebrities and images while encouraging contributions from the research community aimed at refining algorithms for automated data cleaning and unsupervised clustering. Additionally, leveraging the dataset for developing robust property estimators, such as gender classifiers, from facial images highlights other potential research avenues facilitated by MS-Celeb-1M. The authors strive to inspire a breadth of AI research, underscoring the interdisciplinary nature of large-scale face recognition challenges.

Overall, MS-Celeb-1M stands as a critical resource, catalyzing progress in both the academic and practical domains of face recognition technology.