RGB-Infrared Cross-Modality Person Re-Identification via Joint Pixel and Feature Alignment

Published 13 Oct 2019 in cs.CV | (1910.05839v2)

Abstract: RGB-Infrared (IR) person re-identification is an important and challenging task due to large cross-modality variations between RGB and IR images. Most conventional approaches aim to bridge the cross-modality gap with feature alignment by feature representation learning. Different from existing methods, in this paper, we propose a novel and end-to-end Alignment Generative Adversarial Network (AlignGAN) for the RGB-IR RE-ID task. The proposed model enjoys several merits. First, it can exploit pixel alignment and feature alignment jointly. To the best of our knowledge, this is the first work to model the two alignment strategies jointly for the RGB-IR RE-ID problem. Second, the proposed model consists of a pixel generator, a feature generator, and a joint discriminator. By playing a min-max game among the three components, our model is able to not only alleviate the cross-modality and intra-modality variations but also learn identity-consistent features. Extensive experimental results on two standard benchmarks demonstrate that the proposed model performs favorably against state-of-the-art methods. Especially, on SYSU-MM01 dataset, our model can achieve an absolute gain of 15.4% and 12.9% in terms of Rank-1 and mAP.

Abstract PDF Upgrade to Chat

Citations (326)

View on Semantic Scholar

Summary

The paper presents a novel AlignGAN model that integrates joint pixel and feature alignment to effectively bridge the modality gap between RGB and infrared images.
It employs a pixel generator, feature generator, and joint discriminator to harmonize data distributions across modalities for improved re-identification.
Extensive experiments on SYSU-MM01 show the method delivers a 15.4% Rank-1 accuracy boost and a 12.9% mAP increase, highlighting its practical surveillance potential.

Overview of RGB-Infrared Cross-Modality Person Re-Identification via Joint Pixel and Feature Alignment

The paper presented explores advanced methodologies in the area of RGB-Infrared (RGB-IR) person re-identification (Re-ID), a crucial component in surveillance systems. The primary challenge in RGB-IR Re-ID lies in the significant modality gap between RGB and infrared images, which complicates the direct application of single-modality techniques. The authors introduce an innovative Alignment Generative Adversarial Network (AlignGAN), designed to address both cross-modality and intra-modality variations by jointly performing pixel and feature alignment.

Core Contributions

Joint Pixel and Feature Alignment: The paper pioneers the combined use of pixel and feature alignment strategies to bridge the modality gap in Re-ID tasks. Traditional approaches typically rely on either feature alignment or pixel alignment individually. The proposed model is groundbreaking in modeling these alignment strategies jointly, thereby enhancing the RGB-IR Re-ID process.
Robust Model Components: The AlignGAN framework comprises three central components— a pixel generator, a feature generator, and a joint discriminator. The pixel generator transforms RGB images into a domain similar to IR images (fake IR), which facilitates the learning process across both modalities. The feature generator aligns features from fake IR and real IR images in a shared feature space. Concurrently, the joint discriminator exercises a dual role: it distinguishes between real and synthetic image-feature pairs and guides the learning process to focus on identity-consistent features through adversarial training.
Performance Results: Extensive experimentation on established benchmarks such as the SYSU-MM01 dataset demonstrates the model’s superior performance. The proposed AlignGAN achieved a 15.4% improvement in Rank-1 accuracy and a 12.9% increase in mean average precision (mAP) over previous state-of-the-art methods, showcasing its potential efficacy in real-world applications.

Implications and Future Directions

The introduction and validation of the AlignGAN framework contribute substantially to the theoretical and practical aspects of cross-modality Re-ID. From a theoretical perspective, the joint modeling of pixel and feature alignment offers a comprehensive approach that could be extended to other cross-modality or multi-modality recognition tasks. Practically, the ability to maintain identity-consistent features across different modalities showcases its applicability in real-world surveillance where conditions fluctuate between daylight and nighttime, necessitating a seamless transition between RGB and IR imaging systems.

Future explorations might explore fine-tuning adversarial training techniques to further minimize identity inconsistency and modality gaps. Additionally, examining the scalability of AlignGAN for larger datasets or more complex surveillance scenarios could enhance its utility in diverse AI applications. The code availability (https://github.com/wangguanan/AlignGAN) also invites the broader research community to experiment and build upon this work, potentially leading to refined or new methods that leverage this joint alignment strategy for other complex visual recognition tasks.

Markdown Report Issue