- The paper introduces PRIMO, a supervised latent-variable model that quantifies the predictive impact of missing modalities via latent sampling.
- It employs evidence lower bounds and variance-based metrics to analyze prediction changes under incomplete modality data.
- Evaluations on synthetic, AV-MNIST, and MIMIC-III datasets demonstrate PRIMO's effectiveness in guiding robust multimodal inference.
Characterizing the Predictive Impact of Modalities with Supervised Latent-Variable Modeling
Introduction
The paper “Characterizing the Predictive Impact of Modalities with Supervised Latent-Variable Modeling” explores the challenges and methodologies associated with multimodal learning, where not all modalities may be available either during training or inference. The authors introduce PRIMO, a supervised latent-variable imputation model designed to assess the predictive significance of missing modalities. Unlike existing works, PRIMO leverages the partial availability of data and models missing modalities as latent variables, facilitating both prediction and impact analysis.
Methodology
PRIMO Architecture
PRIMO is structured to quantify how missing modalities affect predictions by introducing a latent variable representation. This approach diverges from traditional methods that treat the absence of data as an imputation task. Instead, PRIMO focuses on understanding the variance in predictions due to missing input modalities. It achieves this by sampling from the learned latent distribution, enhancing the prediction strategy in scenarios where modality information is partially unavailable.
Figure 1: An overview of PRIMO illustrating modality impact analysis via latent sampling.
Learning Objective
The learning process in PRIMO is designed to accommodate both complete and incomplete modality data. By modeling a conditional prior over latent variables, the system quantifies the predictive value of different modalities. The use of evidence lower bounds (ELBOs) ensures that the training accommodates the presence or absence of certain modalities effectively. PRIMO’s ability to derive predictive distributions from partial data without complete reliance on generative reconstructions distinguishes it from prior multimodal approaches.
Evaluation Metrics
PRIMO utilizes a variance-based metric to evaluate how predictions change across latent completions. This allows for an instance-level assessment of modality impact, facilitating a fine-grained understanding of the multimodal dataset structure.
Experiments
The efficacy of PRIMO is demonstrated through synthetic and real datasets. The experiments include scenarios such as synthetic XOR problems, Audio-Vision MNIST (AV-MNIST), and clinical datasets like MIMIC-III.
Synthetic XOR and AV-MNIST
In synthetic XOR datasets, PRIMO aligns with unimodal baseline performance when modalities are missing, validating its discriminative modeling prowess. Results on AV-MNIST further showcase PRIMO's effectiveness in retaining prediction accuracy across missing modality scenarios, reinforcing its robustness in handling real-world data variance.

Figure 2: Evaluation of PRIMO on the XOR dataset, highlighting its resilience in modality-missing scenarios.
MIMIC-III Dataset
PRIMO exhibits notable performance in MIMIC-III, a critical care dataset featuring modalities like patient demographics and clinical time-series data. The model's capability to differentiate modality importance based on task demands—such as mortality prediction or ICD-9 code classification—demonstrates its potential for targeted clinical decision support.

Figure 3: Predictive impact visualizations on MIMIC-III indicating the criticality of different modalities.
Implications and Future Work
The proposed approach with PRIMO has significant implications for the interpretation and deployment of AI in fields where data completeness cannot be guaranteed. The model’s ability to elucidate the impact of potential data acquisition (and its associated costs and risks) suggests practical applications in healthcare to optimize diagnostic processes.
Future research could explore extending PRIMO to accommodate more complex, larger-scale multimodal tasks or incorporate active learning techniques to further streamline feature acquisition strategies.
Conclusion
This paper presents a comprehensive look at a novel methodology for approaching multimodal learning under missing data contexts. Through the use of a supervised latent-variable framework, PRIMO offers a robust mechanism to quantify the predictive power of incomplete modalities, providing critical insights into data dependency and sharing pathways to improve both model interpretability and practical application efficacy across diverse domains.
The essay summarizes the paper's contributions and methodologies, specifically highlighting the unique latent-variable modeling approach of PRIMO, its experimental validation across multiple datasets, and its implications for future research in AI and multimodal learning contexts.