DGMamba: Domain Generalization via Generalized State Space Model
Abstract: Domain generalization~(DG) aims at solving distribution shift problems in various scenes. Existing approaches are based on Convolution Neural Networks (CNNs) or Vision Transformers (ViTs), which suffer from limited receptive fields or quadratic complexities issues. Mamba, as an emerging state space model (SSM), possesses superior linear complexity and global receptive fields. Despite this, it can hardly be applied to DG to address distribution shifts, due to the hidden state issues and inappropriate scan mechanisms. In this paper, we propose a novel framework for DG, named DGMamba, that excels in strong generalizability toward unseen domains and meanwhile has the advantages of global receptive fields, and efficient linear complexity. Our DGMamba compromises two core components: Hidden State Suppressing~(HSS) and Semantic-aware Patch refining~(SPR). In particular, HSS is introduced to mitigate the influence of hidden states associated with domain-specific features during output prediction. SPR strives to encourage the model to concentrate more on objects rather than context, consisting of two designs: Prior-Free Scanning~(PFS), and Domain Context Interchange~(DCI). Concretely, PFS aims to shuffle the non-semantic patches within images, creating more flexible and effective sequences from images, and DCI is designed to regularize Mamba with the combination of mismatched non-semantic and semantic information by fusing patches among domains. Extensive experiments on five commonly used DG benchmarks demonstrate that the proposed DGMamba achieves remarkably superior results to state-of-the-art models. The code will be made publicly available at https://github.com/longshaocong/DGMamba.
- Invariant risk minimization. arXiv preprint arXiv:1907.02893, 2019.
- Metareg: Towards domain generalization using meta-regularization. Advances in neural information processing systems, 31, 2018.
- Objectnet: A large-scale bias-controlled dataset for pushing the limits of object recognition models. Advances in neural information processing systems, 32, 2019.
- Recognition in Terra Incognita. In Proceedings of the European Conference on Computer Vision (ECCV), pages 456–473, 2018.
- A theory of learning from different domains. Machine learning, 79(1):151–175, 2010.
- Domain generalization by marginal transfer learning. The Journal of Machine Learning Research, 22(1):46–100, 2021.
- Swad: Domain generalization by seeking flat minima. Advances in Neural Information Processing Systems, 34:22405–22418, 2021.
- Mix and reason: Reasoning over semantic topology with data mixing for domain generalization. Advances in Neural Information Processing Systems, 35:33302–33315, 2022.
- DNA: Domain Generalization with Diversified Neural Averaging. In Proceedings of the 39th International Conference on Machine Learning, pages 4010–4034. PMLR, 2022.
- Imagenet: A large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition, pages 248–255. Ieee, 2009.
- An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale. In International Conference on Learning Representations, 2020.
- Domain generalization via model-agnostic learning of semantic features. Advances in neural information processing systems, 32, 2019.
- Learning to learn with variational information bottleneck for domain generalization. In European conference on computer vision, pages 200–216. Springer, 2020.
- Unbiased metric learning: On the utilization of multiple datasets and web images for softening bias. In Proceedings of the IEEE International Conference on Computer Vision, pages 1657–1664, 2013.
- Rethinking importance weighting for deep learning under distribution shift. Advances in Neural Information Processing Systems, 33:11996–12007, 2020.
- Domain-Adversarial Training of Neural Networks. Journal of Machine Learning Research, 17(59):1–35, 2016.
- Partial success in closing the gap between human and machine vision. Advances in Neural Information Processing Systems, 34:23885–23899, 2021.
- Mamba: Linear-time sequence modeling with selective state spaces. arXiv preprint arXiv:2312.00752, 2023.
- Pit: Position-invariant transform for cross-fov domain adaptation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 8761–8770, 2021.
- In search of lost domain generalization. In International Conference on Learning Representations, 2020.
- Domaindrop: Suppressing domain-sensitive channels for domain generalization. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 19114–19124, 2023.
- Seta: Semantic-aware token augmentation for domain generalization. arXiv preprint arXiv:2403.11792, 2024.
- Diagonal state spaces are as effective as structured state spaces. Advances in Neural Information Processing Systems, 35:22982–22994, 2022.
- Mambaad: Exploring state space models for multi-class unsupervised anomaly detection. arXiv, 2024.
- Deep Residual Learning for Image Recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 770–778, 2016.
- End-to-end video object detection with spatial-temporal transformers. In Proceedings of the 29th ACM International Conference on Multimedia, pages 1507–1516, 2021.
- The many faces of robustness: A critical analysis of out-of-distribution generalization. In Proceedings of the IEEE/CVF international conference on computer vision, pages 8340–8349, 2021.
- Natural adversarial examples. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 15262–15271, 2021.
- Self-challenging improves cross-domain generalization. In European Conference on Computer Vision, pages 124–140. Springer, 2020.
- idag: Invariant dag searching for domain generalization. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 19169–19179, 2023.
- Feature stylization and domain-aware contrastive learning for domain generalization. In Proceedings of the 29th ACM International Conference on Multimedia, pages 22–31, 2021.
- Selfreg: Self-supervised contrastive regularization for domain generalization. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 9619–9628, 2021.
- ImageNet Classification with Deep Convolutional Neural Networks. In Advances in Neural Information Processing Systems, volume 25. Curran Associates, Inc., 2012.
- Out-of-distribution generalization via risk extrapolation (rex). In International Conference on Machine Learning, pages 5815–5826. PMLR, 2021.
- Sparse mixture-of-experts are domain generalizable learners. In The Eleventh International Conference on Learning Representations, 2023.
- Deeper, Broader and Artier Domain Generalization. In 2017 IEEE International Conference on Computer Vision (ICCV), pages 5543–5551, 2017.
- Videomamba: State space model for efficient video understanding, 2024.
- Omg-seg: Is one model good enough for all segmentation? In CVPR, 2024.
- Deep Domain Generalization via Conditional Invariant Adversarial Networks. In Proceedings of the European Conference on Computer Vision (ECCV), pages 624–639, 2018.
- Domain generalization via feature variation decorrelation. In Proceedings of the 29th ACM International Conference on Multimedia, pages 1683–1691, 2021.
- Cloudmix: Dual mixup consistency for unpaired point cloud completion. IEEE Transactions on Visualization and Computer Graphics, 2024.
- Swin-umamba: Mamba-based unet with imagenet-based pretraining. arXiv preprint arXiv:2402.03302, 2024.
- Osan: A one-stage alignment network to unify multimodal alignment and unsupervised domain adaptation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 3551–3560, 2023.
- Vmamba: Visual state space model. arXiv preprint arXiv:2401.10166, 2024.
- Diverse target and contribution scheduling for domain generalization. arXiv preprint arXiv:2309.16460, 2023.
- Rethinking domain generalization: Discriminability and generalizability. arXiv preprint arXiv:2309.16483, 2023.
- Domain generalization using causal matching. In International Conference on Machine Learning, pages 7313–7324. PMLR, 2021.
- Multi-modal domain adaptation for fine-grained action recognition. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 122–132, 2020.
- Reducing domain gap by reducing style bias. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 8690–8699, 2021.
- Tfs-vit: Token-level feature stylization for domain generalization. Pattern Recognition, 149:110213, 2024.
- How do vision transformers work? arXiv preprint arXiv:2202.06709, 2022.
- Causal discovery in heterogeneous environments under the sparse mechanism shift hypothesis. Advances in Neural Information Processing Systems, 35:10904–10917, 2022.
- A unified framework for multimodal domain adaptation. In Proceedings of the 26th ACM international conference on Multimedia, pages 429–437, 2018.
- Distributionally robust neural networks. In International Conference on Learning Representations, 2019.
- Towards Causal Representation Learning. arXiv:2102.11107 [cs], 2021.
- Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017.
- Generalizing across domains via cross-gradient training. In International Conference on Learning Representations, 2018.
- Ba-sam: Scalable bias-mode attention mask for segment anything model. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024.
- Self-distilled vision transformer for domain generalization. In Proceedings of the Asian Conference on Computer Vision, pages 3068–3085, 2022.
- Rethinking multi-domain generalization with a general learning objective. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024.
- Laurens Van der Maaten and Geoffrey Hinton. Visualizing data using t-sne. Journal of machine learning research, 9(11), 2008.
- Deep Hashing Network for Unsupervised Domain Adaptation. In 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 5385–5394, 2017.
- Semantic data augmentation based distance metric learning for domain generalization. In Proceedings of the 30th ACM international conference on multimedia, pages 3214–3223, 2022.
- Sharpness-aware gradient matching for domain generalization. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 3769–3778, 2023.
- A closer look at classifier in adversarial domain generalization. In Proceedings of the 31st ACM International Conference on Multimedia, pages 280–289, 2023.
- Variational disentanglement for domain generalization. Transactions on Machine Learning Research, 2022.
- Domain Generalization via Shuffled Style Assembly for Face Anti-Spoofing. arXiv:2203.05340 [cs], 2022.
- Mamba-unet: Unet-like pure visual mamba for medical image segmentation. arXiv preprint arXiv:2402.05079, 2024.
- Towards open vocabulary learning: A survey. T-PAMI, 2024.
- H-vmunet: High-order vision mamba unet for medical image segmentation. arXiv preprint arXiv:2403.13642, 2024.
- Masked images are counterfactual samples for robust fine-tuning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 20301–20310, 2023.
- A fourier-based framework for domain generalization. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 14383–14392, 2021.
- Pcl: Proxy-based contrastive learning for domain generalization. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 7097–7107, 2022.
- Delving deep into the generalization of vision transformers under distribution shifts. In Proceedings of the IEEE/CVF conference on Computer Vision and Pattern Recognition, pages 7277–7286, 2022.
- Towards principled disentanglement for domain generalization. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 8024–8034, 2022.
- Rethinking mobile block for efficient attention-based models. In ICCV, 2023.
- Learning feature inversion for multi-class unsupervised anomaly detection under general-purpose coco-ad benchmark. arXiv, 2024.
- Adaptive risk minimization: Learning to adapt to domain shift. Advances in Neural Information Processing Systems, 34:23664–23678, 2021.
- Point cloud mamba: Point cloud learning via state space model. arXiv preprint arXiv:2403.00762, 2024.
- Adanpc: Exploring non-parametric classifier for test-time adaptation. In International Conference on Machine Learning, pages 41647–41676. PMLR, 2023.
- Free lunch for domain adversarial training: Environment label smoothing. In The Eleventh International Conference on Learning Representations, 2023.
- Domain-specific risk minimization for domain generalization. In Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, page 3409–3421. Association for Computing Machinery, 2023.
- Domain Generalization via Entropy Regularization. In Advances in Neural Information Processing Systems, volume 33, pages 16096–16107. Curran Associates, Inc., 2020.
- Learning to generalize unseen domains via memory-based multi-source meta-learning for person re-identification. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 6277–6286, 2021.
- Style-hallucinated dual consistency learning: A unified framework for visual domain generalization. International Journal of Computer Vision, pages 1–17, 2023.
- Style-hallucinated dual consistency learning: A unified framework for visual domain generalization. International Journal of Computer Vision, 132(3):837–853, 2024.
- Prompt vision transformer for domain generalization. arXiv preprint arXiv:2208.08914, 2022.
- Domain generalization: A survey. IEEE Transactions on Pattern Analysis and Machine Intelligence, 45(4):4396–4415, 2022.
- Learning to generate novel domains for domain generalization. In European conference on computer vision, pages 561–578. Springer, 2020.
- Domain generalization with mixstyle. In International Conference on Learning Representations, 2021.
- Mixstyle neural networks for domain generalization and adaptation. International Journal of Computer Vision, 132(3):822–836, 2024.
- Uncertainty-aware consistency regularization for cross-domain semantic segmentation. Computer Vision and Image Understanding, 221:103448, 2022.
- Context-aware mixup for domain adaptive semantic segmentation. IEEE Transactions on Circuits and Systems for Video Technology, 33(2):804–817, 2023.
- Self-adversarial disentangling for specific domain adaptation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 45(7):8954–8968, 2023.
- Transvod: end-to-end video object detection with spatial-temporal transformers. IEEE Transactions on Pattern Analysis and Machine Intelligence, 45(6):7853–7869, 2023.
- Test-time domain generalization for face anti-spoofing. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024.
- Instance-aware domain generalization for face anti-spoofing. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 20453–20463, 2023.
- Adaptive mixture of experts learning for generalizable face anti-spoofing. In Proceedings of the 30th ACM International Conference on Multimedia, pages 6009–6018, 2022.
- Domain adaptive semantic segmentation via regional contrastive consistency regularization. In IEEE International Conference on Multimedia and Expo, pages 01–06, 2022.
- Vision mamba: Efficient visual representation learning with bidirectional state space model. arXiv preprint arXiv:2401.09417, 2024.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.