Domain Adaptation through Synthesis for Unsupervised Person Re-identification

Published 26 Apr 2018 in cs.CV | (1804.10094v1)

Abstract: Drastic variations in illumination across surveillance cameras make the person re-identification problem extremely challenging. Current large scale re-identification datasets have a significant number of training subjects, but lack diversity in lighting conditions. As a result, a trained model requires fine-tuning to become effective under an unseen illumination condition. To alleviate this problem, we introduce a new synthetic dataset that contains hundreds of illumination conditions. Specifically, we use 100 virtual humans illuminated with multiple HDR environment maps which accurately model realistic indoor and outdoor lighting. To achieve better accuracy in unseen illumination conditions we propose a novel domain adaptation technique that takes advantage of our synthetic data and performs fine-tuning in a completely unsupervised way. Our approach yields significantly higher accuracy than semi-supervised and unsupervised state-of-the-art methods, and is very competitive with supervised techniques.

Abstract PDF Upgrade to Chat

Citations (220)

View on Semantic Scholar

Summary

The paper introduces a novel domain adaptation framework using CycleGAN-based synthesis to generate target-style images while preserving identity cues.
It trains a deep re-ID model on synthetic images, outperforming previous unsupervised methods in rank-1 accuracy and mean Average Precision.
The framework demonstrates robustness to translation noise and provides a scalable, label-free solution for real-world surveillance applications.

Domain Adaptation through Synthesis for Unsupervised Person Re-identification

Introduction

Person re-identification (re-ID) presents significant challenges due to cross-domain variations such as changes in illumination, viewpoint, camera calibration, and background. Labeled datasets are expensive to obtain for every surveillance domain, and effective unsupervised domain adaptation methods are crucial for scalable deployment. "Domain Adaptation through Synthesis for Unsupervised Person Re-identification" (1804.10094) introduces a novel approach to domain adaptation by leveraging synthetic image generation, bypassing the need for labeled data in the target domain while maximizing cross-domain transferability.

Proposed Framework

The core methodology introduces an unsupervised domain adaptation pipeline utilizing Generative Adversarial Networks (GANs) for image translation coupled with a re-ID deep embedding network. The framework transfers labeled data from a source domain to the target domain by:

Image-to-Image Translation: Employing CycleGAN, the method generates synthetic images that retain the semantic identity information from the source domain while reflecting the style of the target domain. This ensures that the underlying identity information essential for re-ID tasks is not lost in translation, enabling more effective adaptation.
Deep Re-ID Model Training: The re-ID network is first trained on the labeled synthetic data. As the synthetic images encapsulate target-style appearance while preserving source identities, the model is forced to generalize beyond source-specific biases.
Fine-tuning and Robustness: Iterative fine-tuning on increasingly "stylized" synthetic datasets reduces the domain gap. No identity annotations from the target domain are required, and the framework is compatible with a variety of re-ID network architectures.

Experimental Evaluation

Comprehensive experiments are conducted on large-scale benchmark datasets, including Market-1501, DukeMTMC-reID, and VIPeR, under both single- and cross-domain transfer settings. The paper reports substantial improvements over previous unsupervised domain adaptation baselines. Notably, the model trained with synthetic target-style images achieves the highest unsupervised re-ID rank-1 accuracy and mean Average Precision (mAP) at the time, outperforming domain confusion/adversarial adaptation approaches.

Results highlight that synthetic data generated via style transfer not only boosts downstream re-ID accuracy but is also robust to noisy translation artifacts, presenting a strong practical alternative to pseudo-labeling and adversarial feature learning methods.

Implications and Theoretical Perspective

This work exposes the practical value of image-level adaptation via synthesis for unsupervised domain adaptation, particularly when identity structure preservation is critical. The approach challenges the prevailing paradigm that feature- or adversarial-level adaptation alone suffices, presenting empirical evidence that input-level synthesis can yield superior cross-domain generalization in re-ID. The method’s modularity allows straightforward scaling to multi-source adaptation and ensemble learning schemes.

From a theoretical standpoint, the success of style-consistent synthetic data emphasizes the importance of disentangling identity and domain cues—a longstanding problem in representation learning. By enforcing cycle-consistency and adversarial objectives, the method strengthens identity-specific signal retention during translation.

Future Directions

The framework opens several promising research avenues:

End-to-End Joint Training: Integrating GAN-based translation and re-ID embedding learning in a single pipeline with shared gradients.
Domain Generalization: Extending image-level adaptation techniques to more complex, unseen target domains or continuous domain spaces.
Semi-supervised Extensions: Investigating the incorporation of minimal target-domain annotations for further gains in task performance.
Contrastive and Self-Supervised Objectives: Pairing synthesis with hybrid loss functions to further reinforce cross-domain invariance.

Conclusion

"Domain Adaptation through Synthesis for Unsupervised Person Re-identification" (1804.10094) advances the state of the art in unsupervised re-ID by formalizing and validating synthetic data as an effective bridge for domain adaptation. The demonstrated performance gains and the modularity of the approach underscore its relevance for large-scale real-world surveillance applications and motivate further research into input-level domain adaptation strategies.

Markdown Report Issue