BooW-VTON: Boosting In-the-Wild Virtual Try-On via Mask-Free Pseudo Data Training
Abstract: Image-based virtual try-on is an increasingly popular and important task to generate realistic try-on images of the specific person. Recent methods model virtual try-on as image mask-inpaint task, which requires masking the person image and results in significant loss of spatial information. Especially, for in-the-wild try-on scenarios with complex poses and occlusions, mask-based methods often introduce noticeable artifacts. Our research found that a mask-free approach can fully leverage spatial and lighting information from the original person image, enabling high-quality virtual try-on. Consequently, we propose a novel training paradigm for a mask-free try-on diffusion model. We ensure the model's mask-free try-on capability by creating high-quality pseudo-data and further enhance its handling of complex spatial information through effective in-the-wild data augmentation. Besides, a try-on localization loss is designed to concentrate on try-on area while suppressing garment features in non-try-on areas, ensuring precise rendering of garments and preservation of fore/back-ground. In the end, we introduce BooW-VTON, the mask-free virtual try-on diffusion model, which delivers SOTA try-on quality without parsing cost. Extensive qualitative and quantitative experiments have demonstrated superior performance in wild scenarios with such a low-demand input.
- Demystifying MMD GANs. In International Conference on Learning Representations.
- Realtime multi-person 2d pose estimation using part affinity fields. In Proceedings of the IEEE conference on computer vision and pattern recognition, 7291–7299.
- Viton-hd: High-resolution virtual try-on via misalignment-aware normalization. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 14131–14140.
- Improving Diffusion Models for Virtual Try-on. CoRR, abs/2403.05139.
- Street TryOn: Learning In-the-Wild Virtual Try-On from Unpaired Person Images. CoRR, abs/2311.16094.
- Disentangled Cycle Consistency for Highly-Realistic Virtual Try-On. In IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2021, virtual, June 19-25, 2021, 16928–16937. Computer Vision Foundation / IEEE.
- Parser-free virtual try-on via distilling appearance flows. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 8485–8493.
- DeepFashion2: A Versatile Benchmark for Detection, Pose Estimation, Segmentation and Re-Identification of Clothing Images. In IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2019, Long Beach, CA, USA, June 16-20, 2019, 5337–5345. Computer Vision Foundation / IEEE.
- Taming the Power of Diffusion Models for High-Quality Virtual Try-On with Appearance Flow. arXiv preprint arXiv:2308.06101.
- Densepose: Dense human pose estimation in the wild. In Proceedings of the IEEE conference on computer vision and pattern recognition, 7297–7306.
- Viton: An image-based virtual try-on network. In Proceedings of the IEEE conference on computer vision and pattern recognition, 7543–7552.
- Style-Based Global Appearance Flow for Virtual Try-On. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2022, New Orleans, LA, USA, June 18-24, 2022, 3460–3469. IEEE.
- Gans trained by a two time-scale update rule converge to a local nash equilibrium. Advances in neural information processing systems, 30.
- Denoising diffusion probabilistic models. Advances in neural information processing systems, 33: 6840–6851.
- Do Not Mask What You Do Not Need to Mask: A Parser-Free Virtual Try-On. In Vedaldi, A.; Bischof, H.; Brox, T.; and Frahm, J., eds., Computer Vision - ECCV 2020 - 16th European Conference, Glasgow, UK, August 23-28, 2020, Proceedings, Part XX, volume 12365 of Lecture Notes in Computer Science, 619–635. Springer.
- StableVITON: Learning Semantic Correspondence with Latent Diffusion Model for Virtual Try-On. CoRR, abs/2312.01725.
- High-Resolution Virtual Try-On with Misalignment and Occlusion-Handled Conditions. arXiv preprint arXiv:2206.14180.
- Cp-vton+: Clothing shape and texture preserving image-based virtual try-on. In CVPR Workshops, volume 3, 10–14.
- LaDI-VTON: Latent Diffusion Textual-Inversion Enhanced Virtual Try-On. arXiv preprint arXiv:2305.13501.
- Dress Code: High-Resolution Multi-Category Virtual Try-On. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2231–2235.
- Improved Denoising Diffusion Probabilistic Models. In Meila, M.; and Zhang, T., eds., Proceedings of the 38th International Conference on Machine Learning, ICML 2021, 18-24 July 2021, Virtual Event, volume 139 of Proceedings of Machine Learning Research, 8162–8171. PMLR.
- DINOv2: Learning Robust Visual Features without Supervision. CoRR, abs/2304.07193.
- Learning transferable visual models from natural language supervision. In International conference on machine learning, 8748–8763. PMLR.
- High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 10684–10695.
- U-net: Convolutional networks for biomedical image segmentation. In Medical Image Computing and Computer-Assisted Intervention–MICCAI 2015: 18th International Conference, Munich, Germany, October 5-9, 2015, Proceedings, Part III 18, 234–241. Springer.
- Denoising Diffusion Implicit Models. In International Conference on Learning Representations.
- Toward characteristic-preserving image-based virtual try-on network. In Proceedings of the European conference on computer vision (ECCV), 589–604.
- J. Goodfellow, Jean Pouget-Abadie and Yoshua Bengio. Generative adversarial nets.
- OOTDiffusion: Outfitting Fusion based Latent Diffusion for Controllable Virtual Try-on. CoRR, abs/2403.01779.
- Towards photo-realistic virtual try-on by adaptively generating-preserving image content. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 7850–7859.
- Texture-Preserving Diffusion Models for High-Fidelity Virtual Try-On. CoRR, abs/2404.01089.
- CAT-DM: Controllable Accelerated Virtual Try-on with Diffusion Model. CoRR, abs/2311.18405.
- Transparent Image Layer Diffusion using Latent Transparency. CoRR, abs/2402.17113.
- GP-VTON: Towards General Purpose Virtual Try-on via Collaborative Local-Flow Global-Parsing Learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.