MDS-ICU: Multimodal Deep Learning in ICU
- MDS-ICU is a unified multimodal deep learning framework that fuses ECG waveforms and clinical data to predict 33 ICU outcomes.
- It employs structured S4 encoders and RealMLP for robust data fusion, achieving high discrimination and calibration versus clinicians and LLMs.
- The system supports real-time risk monitoring with automated alerts and seamless integration into EHRs for enhanced decision support.
MDS-ICU is a unified multimodal deep learning framework designed for comprehensive predictive support in the intensive care unit (ICU) setting. It integrates diverse routinely collected clinical data—including raw ECG waveforms, tabular physiological measures, laboratory results, procedural histories, and medical device usage—to provide continuous risk assessments across a spectrum of 33 clinically relevant outcomes, encompassing mortality, organ dysfunction, medication administration, and acute deterioration. The architecture employs structured state space (S4) encoders and a multilayer perceptron (RealMLP) for heterogeneous data fusion, achieving strong discrimination and calibration. MDS-ICU’s predictions have been benchmarked against ICU physicians and LLMs, demonstrating both superior standalone performance and measurable improvements in clinician/LLM accuracy when its outputs are provided as decision support (Alcaraz et al., 10 Jan 2026).
1. Multimodal Data Integration and Preprocessing
MDS-ICU combines disparate data modalities to closely reflect the complexity of ICU decision making:
- Demographics and Biometrics: Includes age, sex, ethnicity, height, weight, and body-mass index.
- Physiological Monitoring: Captures real-time vital signs—systolic/diastolic/mean arterial pressures, heart rate, respiratory rate, SpO₂, ventilator parameters (PEEP, FiO₂, tidal and minute volumes), temperature, central venous pressure, and neurological scales (GCS, RASS).
- Laboratory Data: Encompasses hematology, electrolytes, renal and hepatic function, inflammatory markers (e.g., CRP, troponin T), and blood gas analyses.
- Procedural and Device Data: Surgical interventions (cardiac, general, neurosurgical, etc.), mechanical ventilation (invasive/noninvasive), and ECMO usage.
- ECG Waveforms: Raw 10-second, 12-lead clinical ECGs sampled at high frequency (e.g., 500 Hz).
Preprocessing actions include rigorous outlier removal and plausibility filtering, statistical summarization for irregular time series (e.g., min/max/first/last values, temporal deltas), categorical encoding, and normalization. ECG waveforms are baseline- and noise-filtered, per-lead normalized, and then input into the S4 encoder. Tabular missing values receive median imputation with binary missingness indicators. Feature scaling leverages robust quantile-clipping and learned per-feature rescaling within RealMLP (Alcaraz et al., 10 Jan 2026).
2. Model Architecture and Mathematical Framework
The MDS-ICU framework is composed of two parallel modality-specific encoders with late fusion:
- S4 ECG Encoder: Implements a discretized linear state-space model with hidden state and input :
with learned; is parameterized via a low-rank plus diagonal decomposition. Four such S4 layers are stacked, interleaved with dropout and GeLU activations, followed by global pooling to obtain a fixed-length vector .
- RealMLP Tabular Encoder: Receives a preprocessed vector , applies per-feature scaling (), then passes through three NTPLinear layers with SELU activations:
producing (typ. 128-dimensional).
- Fusion and Prediction: and are concatenated to and processed by a feed-forward head (linear + GeLU), yielding logits for 33 binary tasks, with probabilities .
- Loss Function: The training objective is summed binary cross-entropy over all tasks:
3. Training Protocol and Hyperparameter Configuration
The model is trained using a stratified 20-fold patient-wise split (18:1:1 for training:validation:test):
| Set | Samples | Percentage |
|---|---|---|
| Training | 56,702 | ~85% |
| Validation | 3,150 | ~5% |
| Test | 3,149 | ~5% |
AdamW optimization is performed with a constant learning rate and weight decay . Training utilizes batch size 64 over 20 epochs, with early stopping by validation macro-AUROC. Regularization includes dropout in S4 blocks, self-normalizing SELU activations, weight decay, and explicit missing-value indicators (Alcaraz et al., 10 Jan 2026).
4. Discriminative and Calibrative Performance
Discrimination
| Outcome | AUROC |
|---|---|
| 1-day mortality | 0.9009 |
| Invasive mechanical ventilation | 0.9722 |
| Sedative administration | 0.9182 |
| Coagulation dysfunction (SOFA ≥2) | 0.9325 |
| Macro-average (33 tasks) | 0.8650 |
MDS-ICU exhibits high discrimination across domains spanning acute deterioration, organ dysfunction, and therapy needs.
Calibration
Brier score and expected calibration error (ECE) quantify agreement between predicted and empirical risks. Integration of ECG waveforms yields observed improvements in certain Brier scores (e.g., stay mortality: 0.084→0.078), with ECE reflecting low miscalibration. Reliability plots for S4+RealMLP approximate the ideal (diagonal) closely, particularly in high-risk subpopulations.
5. Clinician and LLM Benchmarking
Benchmarks were conducted against human (n=4 ICU physicians) and LLM (GPT 5.2, Claude 4.5) predictors. Two experimental settings were used:
- Benchmark A: Predictions based solely on tabular+ECG plot input.
- Benchmark B: Predictions made after revealing MDS-ICU probabilities.
Performance was evaluated using ROC curves and the Youden index (sensitivity + specificity – 1). In Benchmark A, MDS-ICU outperformed clinicians in 56.25% and LLMs in 62.5% of cases. In Benchmark B, average Youden index increased for clinicians by 12% and for LLMs by 16% upon exposure to model output. Instances occurred where the clinician/model or LLM/model ensemble exceeded, matched, or underperformed the standalone model.
6. Clinical Utility and Implementation Challenges
MDS-ICU provides continuously updated, multimodal risk scores facilitating early warning and ICU resource management. Integration options include:
- Real-time dashboards for risk trajectory monitoring
- Automated event alerts (impending respiratory failure, etc.)
- Embedding outputs in clinician notes and rounding tools
Key challenges for deployment include EHR/waveform interoperability (HL7/FHIR, DICOM-ECG), regulatory validation, explainability (model saliency, example-based explanations), data privacy (on-premises vs. cloud inference), and ongoing monitoring for model drift and recalibration.
This suggests that robust, multimodal architectures such as MDS-ICU, which combine structured time-series modeling with rich tabular representation and late fusion, can deliver state-of-the-art, well-calibrated risk stratification and augment clinician judgment, providing a foundation for precision ICU decision support (Alcaraz et al., 10 Jan 2026).