PersonNet: Person Re-identification with Deep Convolutional Neural Networks

Published 27 Jan 2016 in cs.CV | (1601.07255v2)

Abstract: In this paper, we propose a deep end-to-end neu- ral network to simultaneously learn high-level features and a corresponding similarity metric for person re-identification. The network takes a pair of raw RGB images as input, and outputs a similarity value indicating whether the two input images depict the same person. A layer of computing neighborhood range differences across two input images is employed to capture local relationship between patches. This operation is to seek a robust feature from input images. By increasing the depth to 10 weight layers and using very small (3$\times$3) convolution filters, our architecture achieves a remarkable improvement on the prior-art configurations. Meanwhile, an adaptive Root- Mean-Square (RMSProp) gradient decent algorithm is integrated into our architecture, which is beneficial to deep nets. Our method consistently outperforms state-of-the-art on two large datasets (CUHK03 and Market-1501), and a medium-sized data set (CUHK01).

Abstract PDF Upgrade to Chat

Citations (218)

View on Semantic Scholar

Summary

The paper introduces PersonNet, a deep convolutional neural network architecture that jointly learns feature representations and a similarity metric for end-to-end person re-identification.
PersonNet employs a deep 10-layer architecture with small 3x3 convolution filters and utilizes the RMSProp algorithm for efficient and stable training.
Extensive experiments show PersonNet achieves superior performance on benchmark datasets like CUHK03 and Market-1501, surpassing state-of-the-art results in person re-identification.

Overview of PersonNet: Person Re-identification with Deep Convolutional Neural Networks

The paper entitled "PersonNet: Person Re-identification with Deep Convolutional Neural Networks" by Lin Wu, Chunhua Shen, and Anton van den Hengel introduces an advanced approach for addressing the challenging task of person re-identification (re-id) using deep learning. The authors propose a deep end-to-end convolutional neural network (CNN) architecture named PersonNet that simultaneously learns high-level features and a similarity metric to match pedestrian images from multiple non-overlapping camera views.

Motivation and Contributions

Person re-identification is a complex task due to various challenges such as visual appearance changes, human pose variations, occlusions, and different illumination conditions across camera views. Traditional methods for person re-id involve constructing robust feature representations and designing appropriate distance measures separately or jointly. The novelty of this work lies in leveraging the capability of CNNs to jointly learn feature representations and metric learning, which allows for an end-to-end optimization of the re-id task.

The authors highlight three main contributions:

The introduction of a deep neural network architecture that increases the depth to 10 layers using small (3×3) convolution filters, which marks a significant improvement in the person re-id task by achieving an increased network depth.
Integration of the RMSProp adaptive gradient descent algorithm within the network, enhancing the convergence speed and stability compared to standard stochastic gradient descent.
Extensive evaluation on benchmark datasets (CUHK03, Market-1501, and CUHK01) where PersonNet demonstrates superior performance, surpassing state-of-the-art results.

Network Architecture and Methodology

PersonNet processes input images through a series of convolutional layers with small receptive fields, followed by a patch matching layer that computes local differences between input images to capture neighborhood relations. This is followed by several fully connected layers leading to a softmax layer, which provides a similarity score indicating if the input image pair shows the same person. By employing small convolution filters, the architecture benefits from more non-linear activations, contributing to the discriminative power of learnt features.

Instead of the traditional SGD, PersonNet adopts RMSProp for optimization, which adjusts learning rates based on recent gradient magnitudes, complementing the deep network architecture's robustness and easing the burden of hyperparameter tuning typically required with deeper models.

Experimental Results

The authors conduct detailed experiments on three datasets: CUHK03, CUHK01, and Market-1501, demonstrating the effectiveness of their network. In the experiments, PersonNet outperforms existing person re-id methods by a significant margin. For instance, on the CUHK03 dataset, PersonNet achieves a Rank-1 recognition rate of 64.8%, surpassing the next best method at 62.1%.

Implications and Future Directions

The proposed PersonNet method highlights the potential for deep neural networks in solving complex recognition tasks such as person re-identification. The integration of RMSProp in the deep learning pipeline offers insights into more efficient learning in deep architectures, suggesting pathways for future improvements in other domains requiring deep learning solutions.

Future developments may focus on further increasing the network depth, integrating additional mechanisms for handling spatial misalignments, or exploring ensemble-based techniques combining several deep models. Moreover, developments in unsupervised or semi-supervised re-id frameworks could leverage the strategies outlined by the authors to reduce the dependency on labeled data, expanding the utility of these models.

Overall, this paper contributes significantly to the field of person re-identification, providing valuable insights and a foundation for future research endeavors in leveraging deep learning for surveillance and security applications.

Markdown Report Issue