Split to Merge: Unifying Separated Modalities for Unsupervised Domain Adaptation
Abstract: Large vision-LLMs (VLMs) like CLIP have demonstrated good zero-shot learning performance in the unsupervised domain adaptation task. Yet, most transfer approaches for VLMs focus on either the language or visual branches, overlooking the nuanced interplay between both modalities. In this work, we introduce a Unified Modality Separation (UniMoS) framework for unsupervised domain adaptation. Leveraging insights from modality gap studies, we craft a nimble modality separation network that distinctly disentangles CLIP's features into language-associated and vision-associated components. Our proposed Modality-Ensemble Training (MET) method fosters the exchange of modality-agnostic information while maintaining modality-specific nuances. We align features across domains using a modality discriminator. Comprehensive evaluations on three benchmarks reveal our approach sets a new state-of-the-art with minimal computational costs. Code: https://github.com/TL-UESTC/UniMoS
- Unsupervised multi-source domain adaptation without access to source data. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 10103–10112, 2021.
- A theory of learning from different domains. Machine learning, 79:151–175, 2010.
- Domain separation networks. Advances in neural information processing systems, 29, 2016.
- Deep clustering for unsupervised learning of visual features. In Proceedings of the European conference on computer vision (ECCV), pages 132–149, 2018.
- A simple framework for contrastive learning of visual representations. In International conference on machine learning, pages 1597–1607. PMLR, 2020.
- An image is worth 16x16 words: Transformers for image recognition at scale. In International Conference on Learning Representations, 2020.
- Unsupervised domain adaptation by backpropagation. In International conference on machine learning, pages 1180–1189. PMLR, 2015.
- Domain-adversarial training of neural networks. The journal of machine learning research, 17(1):2096–2030, 2016.
- Clip-adapter: Better vision-language models with feature adapters. International Journal of Computer Vision, pages 1–15, 2023.
- Domain adaptation via prompt learning. IEEE Transactions on Neural Networks and Learning Systems, pages 1–11, 2023.
- Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770–778, 2016.
- Momentum contrast for unsupervised visual representation learning. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 9729–9738, 2020.
- Scaling up visual and vision-language representation learning with noisy text supervision. In International conference on machine learning, pages 4904–4916. PMLR, 2021.
- Understanding and constructing latent modality structures in multi-modal representation learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 7661–7671, 2023.
- Contrastive adaptation network for unsupervised domain adaptation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 4893–4902, 2019.
- Maple: Multi-modal prompt learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 19113–19122, 2023.
- Padclip: Pseudo-labeling with adaptive debiasing in clip for unsupervised domain adaptation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 16155–16165, 2023.
- Maximum density divergence for domain adaptation. IEEE transactions on pattern analysis and machine intelligence, 43(11):3918–3930, 2020.
- Divergence-agnostic unsupervised domain adaptation by adversarial attacks. IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(11):8196–8211, 2021.
- Source-free active domain adaptation via energy-based locality preserving transfer. In Proceedings of the 30th ACM International Conference on Multimedia, pages 5802–5810, 2022.
- Do we really need to access the source data? source hypothesis transfer for unsupervised domain adaptation. In International conference on machine learning, pages 6028–6039. PMLR, 2020.
- Pareto domain adaptation. Advances in Neural Information Processing Systems, 34:12917–12929, 2021a.
- Domain adaptation with auxiliary target domain-oriented classifier. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 16632–16642, 2021b.
- Mind the gap: Understanding the modality gap in multi-modal contrastive representation learning. Advances in Neural Information Processing Systems, 35:17612–17625, 2022.
- Guiding pseudo-labels with uncertainty estimation for source-free unsupervised domain adaptation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 7640–7650, 2023.
- Cycle self-training for domain adaptation. Advances in Neural Information Processing Systems, 34:22968–22981, 2021a.
- Swin transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the IEEE/CVF international conference on computer vision, pages 10012–10022, 2021b.
- Learning transferable features with deep adaptation networks. In International conference on machine learning, pages 97–105. PMLR, 2015.
- Unsupervised domain adaptation with residual transfer networks. Advances in neural information processing systems, 29, 2016.
- Conditional adversarial domain adaptation. Advances in neural information processing systems, 31, 2018.
- Slip: Self-supervision meets language-image pre-training. In European Conference on Computer Vision, pages 529–544. Springer, 2022.
- Fixbi: Bridging domain spaces for unsupervised domain adaptation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 1094–1103, 2021.
- Teaching clip to count to ten. arXiv preprint arXiv:2302.12066, 2023.
- Visda: The visual domain adaptation challenge. arXiv preprint arXiv:1710.06924, 2017.
- Moment matching for multi-source domain adaptation. In Proceedings of the IEEE/CVF international conference on computer vision, pages 1406–1415, 2019.
- Learning transferable visual models from natural language supervision. In International conference on machine learning, pages 8748–8763. PMLR, 2021.
- A closer look at smoothness in domain adversarial training. In International Conference on Machine Learning, pages 18378–18399. PMLR, 2022.
- Denseclip: Language-guided dense prediction with context-aware prompting. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 18082–18091, 2022.
- Maximum classifier discrepancy for unsupervised domain adaptation. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 3723–3732, 2018.
- Semi-supervised domain adaptation via minimax entropy. In Proceedings of the IEEE/CVF international conference on computer vision, pages 8050–8058, 2019.
- Ad-clip: Adapting domains in prompt space using clip. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 4355–4364, 2023.
- Prior knowledge guided unsupervised domain adaptation. In European Conference on Computer Vision, pages 639–655. Springer, 2022a.
- Safe self-refinement for transformer-based domain adaptation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 7191–7200, 2022b.
- Training data-efficient image transformers & distillation through attention. In International conference on machine learning, pages 10347–10357. PMLR, 2021.
- Laurens Van der Maaten and Geoffrey Hinton. Visualizing data using t-sne. Journal of machine learning research, 9(11), 2008.
- Deep hashing network for unsupervised domain adaptation. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 5018–5027, 2017.
- Ufo: A unified transformer for vision-language representation learning. arXiv preprint arXiv:2111.10023, 2021.
- Ofa: Unifying architectures, tasks, and modalities through a simple sequence-to-sequence learning framework. In International Conference on Machine Learning, pages 23318–23340. PMLR, 2022a.
- Debiased learning from naturally imbalanced pseudo-labels. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 14647–14657, 2022b.
- Maximum structural generation discrepancy for unsupervised domain adaptation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 45(3):3434–3445, 2022.
- Cdtrans: Cross-domain transformer for unsupervised domain adaptation. In International Conference on Learning Representations, 2021.
- Tvt: Transferable vision transformer for unsupervised domain adaptation. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pages 520–530, 2023a.
- Exploiting the intrinsic neighborhood structure for source-free domain adaptation. Advances in neural information processing systems, 34:29393–29405, 2021.
- Trust your good friends: Source-free domain adaptation by reciprocal neighborhood clustering. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023b.
- Make the u in uda matter: Invariant consistency learning for unsupervised domain adaptation. arXiv preprint arXiv:2309.12742, 2023.
- mixup: Beyond empirical risk minimization. arXiv preprint arXiv:1710.09412, 2017.
- Tip-adapter: Training-free clip-adapter for better vision-language modeling. arXiv preprint arXiv:2111.03930, 2021.
- Rethinking the role of pre-trained networks in source-free domain adaptation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 18841–18851, 2023a.
- Bridging theory and algorithm for domain adaptation. In International conference on machine learning, pages 7404–7413. PMLR, 2019.
- Towards effective instance discrimination contrastive loss for unsupervised domain adaptation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 11388–11399, 2023b.
- Conditional prompt learning for vision-language models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 16816–16825, 2022a.
- Learning to prompt for vision-language models. International Journal of Computer Vision, 130(9):2337–2348, 2022b.
- Patch-mix transformer for unsupervised domain adaptation: A game perspective. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 3561–3571, 2023.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.