Papers
Topics
Authors
Recent
Search
2000 character limit reached

M-FLAG: Medical Vision-Language Pre-training with Frozen Language Models and Latent Space Geometry Optimization

Published 17 Jul 2023 in cs.CV, cs.AI, and cs.LG | (2307.08347v2)

Abstract: Medical vision-LLMs enable co-learning and integrating features from medical imaging and clinical text. However, these models are not easy to train and the latent representation space can be complex. Here we propose a novel way for pre-training and regularising medical vision-LLMs. The proposed method, named Medical vision-language pre-training with Frozen LLMs and Latent spAce Geometry optimization (M-FLAG), leverages a frozen LLM for training stability and efficiency and introduces a novel orthogonality loss to harmonize the latent space geometry. We demonstrate the potential of the pre-trained model on three downstream tasks: medical image classification, segmentation, and object detection. Extensive experiments across five public datasets demonstrate that M-FLAG significantly outperforms existing medical vision-language pre-training approaches and reduces the number of parameters by 78\%. Notably, M-FLAG achieves outstanding performance on the segmentation task while using only 1\% of the RSNA dataset, even outperforming ImageNet pre-trained models that have been fine-tuned using 100\% of the data.

Citations (51)

Summary

No one has generated a summary of this paper yet.

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.