Papers
Topics
Authors
Recent
Search
2000 character limit reached

Variance-Aware Loss Scheduling for Multimodal Alignment in Low-Data Settings

Published 5 Mar 2025 in cs.CV | (2503.03202v1)

Abstract: Training vision-LLMs for image-text alignment typically requires large datasets to achieve robust performance. In low-data scenarios, standard contrastive learning can struggle to align modalities effectively due to overfitting and unstable training dynamics. In this paper, we propose a variance-aware loss scheduling approach that dynamically adjusts the weighting of the contrastive loss based on the statistical variability (uncertainty) in the model's alignment predictions. Using a subset of the Flickr8k image-caption dataset to simulate limited data conditions, we demonstrate that our approach improves image-text retrieval accuracy compared to a fixed-weight baseline. We also compare against other adaptive weighting strategies (using output entropy and cosine similarity spread) and find that variance-aware scheduling provides the best overall trade-off. Qualitatively, our method yields more distinct multimodal embeddings as shown by t-SNE visualizations. Moreover, in a stress test with noise-injected captions and images, the variance-guided loss proves more robust, maintaining higher recall when random perturbations are introduced. These results highlight the benefit of adaptive loss weighting for multimodal alignment in low-data regimes.

Summary

No one has generated a summary of this paper yet.

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Authors (1)

Collections

Sign up for free to add this paper to one or more collections.