Papers
Topics
Authors
Recent
Search
2000 character limit reached

Cross-Modal Distillation For Widely Differing Modalities

Published 22 Jul 2025 in cs.AI | (2507.16296v1)

Abstract: Deep learning achieved great progress recently, however, it is not easy or efficient to further improve its performance by increasing the size of the model. Multi-modal learning can mitigate this challenge by introducing richer and more discriminative information as input. To solve the problem of limited access to multi-modal data at the time of use, we conduct multi-modal learning by introducing a teacher model to transfer discriminative knowledge to a student model during training. However, this knowledge transfer via distillation is not trivial because the big domain gap between the widely differing modalities can easily lead to overfitting. In this work, we introduce a cross-modal distillation framework. Specifically, we find hard constrained loss, e.g. l2 loss forcing the student being exact the same as the teacher, can easily lead to overfitting in cross-modality distillation. To address this, we propose two soft constrained knowledge distillation strategies at the feature level and classifier level respectively. In addition, we propose a quality-based adaptive weights module to weigh input samples via quantified data quality, leading to robust model training. We conducted experiments on speaker recognition and image classification tasks, and the results show that our approach is able to effectively achieve knowledge transfer between the commonly used and widely differing modalities of image, text, and speech.

Summary

No one has generated a summary of this paper yet.

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.