MIMIC-IV-Ext-PE: PE Labels for CTPA Reports

Updated 29 January 2026

MIMIC-IV-Ext-PE is an extension to the MIMIC-IV database that provides validated, high-resolution PE labels for CTPA reports through expert manual adjudication and automated VTE-BERT classification.
It combines dual-physician review with a Bio_ClinicalBERT-based transformer model to achieve high sensitivity (92.4%) and specificity (98.9%) in distinguishing acute, subsegmental, chronic, and equivocal cases.
The resource facilitates reproducible cohort assembly, automated phenotyping, and benchmarking of NLP and AI systems in hematologic and critical-care research.

MIMIC-IV-Ext-PE is an extension to the MIMIC-IV (Medical Information Mart for Intensive Care IV) database that provides validated, high-resolution label annotations for pulmonary embolism (PE) in computed tomography pulmonary angiography (CTPA) radiology reports. The resource is structured to enable hematologic, critical-care, and machine learning research, leveraging both expert manual adjudication and semi-supervised LLMs. It addresses a significant unmet need for large publicly available PE-labeled clinical datasets, supporting reproducible cohort assembly, automated phenotyping, and benchmarking of NLP and AI systems on real-world clinical narratives (Lam et al., 2024).

1. Dataset Construction and Cohort Definition

MIMIC-IV-Ext-PE is derived from the complete set of radiology reports in MIMIC-IV (v3.0; BIDMC, 2008–2019), which encompasses both emergency-room and ICU admissions. Initial filtering via regular expressions yielded 21,948 candidate CTPA studies from 2,321,355 total reports. After manual verification, 19,942 reports corresponding to 15,875 unique patients were confirmed as CTPAs. Of these, 62% (12,355) were inpatient-associated and 38% (7,587) originated from the emergency room only. Patient demographics: median age 58, 51.3% female, 59.8% White, 16.1% Black, 5.2% Hispanic/Latino, remainder other categories (Lam et al., 2024).

The construction process begins by programmatically screening for CTPA report candidates and removing false positives through manual review. This curated dataset provides a robust sampling of both acute and chronic PE cases, as well as negative and equivocal studies.

2. Labeling Methodology: Manual and Model-Assisted Approaches

Manual Adjudication

A dual-physician protocol establishes the gold standard labels. Each CTPA report is preprocessed to extract PE-relevant sentences. Physician 1 reviews the extracted content with access to a preliminary model prediction, while Physician 2 reviews the text in a blinded fashion. Labels are assigned as:

Positive (acute PE): any explicit mention of "Acute PE" or "Acute+Chronic PE," including isolated subsegmental emboli.
Negative: "Chronic PE only," "Equivocal findings," or absence of PE description.

Discrepancies are resolved by consensus. This process resulted in 1,591 acute PE-positive reports (7.98% positivity), with subcategorization for 233 subsegmental-only cases, as well as 345 "chronic only" and 104 "equivocal" among the 18,351 PE-negative reports (Lam et al., 2024).

VTE-BERT Automated Labeling

Automated PE labeling uses VTE-BERT, a Bio_ClinicalBERT-based transformer model pre-trained on PubMed and MIMIC-III clinical narratives, further fine-tuned on ~3,000 notes from oncology patients for venous thromboembolism detection but not retrained on the MIMIC-IV CTPA corpus, constituting a true external validation. Reports are segmented into relevant sections; all sentences featuring PE terminology are aggregated via RegEx and classified as PE-positive or PE-negative using VTE-BERT. If no PE-relevant sentence is found, a negative label is assigned. No text augmentation or further fine-tuning is performed in this release (Lam et al., 2024).

3. Label Quality, Validation Metrics, and Benchmarking

Label quality is benchmarked using standard diagnostic metrics with manual labels as ground truth. The following formulas are applied:

Sensitivity = $\frac{TP}{TP+FN}$
Positive Predictive Value (PPV) = $\frac{TP}{TP+FP}$
Specificity = $\frac{TN}{TN+FP}$
Negative Predictive Value (NPV) = $\frac{TN}{TN+FN}$ (Lam et al., 2024).

For the entire cohort (n=19,942), VTE-BERT achieves:

Sensitivity: 92.4%
PPV: 87.8%
Specificity: 98.9%
NPV: 99.3%

By comparison, among hospitalized cases with available ICD-9/10 discharge codes (n=11,990):

ICD codes: Sensitivity 95.4%, PPV 83.8%, Specificity 97.7%, NPV 99.4% (Lam et al., 2024).

Subgroup analysis shows uniform VTE-BERT performance across ER-only and inpatient subsets. The model's main source of false positives is chronic PE misclassification. For subsegmental PE (n=233), VTE-BERT identified all manually confirmed cases, while ICD codes flagged only 4.

4. Resource Contents, Schema, and Access

The MIMIC-IV-Ext-PE extension adds three key fields to the MIMIC-IV radiology reports table:

pe_label_binary (0=negative, 1=acute PE)
pe_label_subtype (acute, subsegmental, chronic, equivocal)
pe_label_source (manual, VTE-BERT)

All 19,942 CTPA studies are annotated and publicly released under the same terms as MIMIC-IV (CC BY 4.0 with DUA). Access requires completion of CITI “Data or Specimens Only Research” training and agreement with the PhysioNet credentialed data use agreement. The extension will be available through the main MIMIC-IV repository under “extensions”; code for CTPA identification and RegEx preprocessing is referenced in the original manuscript (Lam et al., 2024).

5. Validation, Limitations, and Best Practices

Error Analysis and Subgroup Performance

The most frequent error mode for VTE-BERT consists of false positives due to chronic PE language; equivocal or highly cirumstantial language can also bypass the binary schema. Performance does not degrade in ER-only versus inpatient cohorts.

Among limitations:

The annotation reflects a single-center population (BIDMC), potentially restricting generalizability.
Manual gold standard labeling, though dual-reviewed with one reviewer blinded, still may introduce bias.
Coverage based on billable discharges omits ER-only cases in ICD code benchmarks.
VTE-BERT was not fine-tuned on these specific CTPA reports; additional domain-specific adaptation could further improve precision or specificity for chronic and equivocal findings (Lam et al., 2024).

Practical Implementation Steps

Secure standard MIMIC-IV access credentials and complete required research training.
Query the radiology reports table, extracting the pe_label* columns.
Link using subject_id, hadm_id, and report_id to build multimodal cohorts with vitals, labs, procedures, or other notes.
RegEx scripts included in the codebase can be adapted for sentence isolation in other corpora.
Researchers may fine-tune VTE-BERT or larger LLMs on their own institution’s labeled subset to customize for institutional language or to support additional label classes (e.g., chronic PE).

6. Comparative Resources and Benchmarking with Other Pipelines

Distinct from MIMIC-IV-Ext-PE, other efforts in the field use LLM-based abstraction and alternative curation standards. For example, a parallel study extracted and annotated 9,132 CTPE reports in MIMIC-IV using dual physician review and LLaMA-family LLMs, assigning four-class labels (present, absent, missing, uncertain) for PE presence, anatomical location, right heart strain, and image quality artifacts (Alwakeel et al., 26 Mar 2025). Performance, reported as Cohen’s κ, indicated that for binary PE detection, agreement reached 0.98 with 70B LLaMA models. However, no public code or annotated dataset for that resource is currently released.

Generic MIMIC-IV extraction tools such as METRE supply model-ready tabular datasets and facilitate cohort selection, feature extraction, and preprocessing, but do not specifically target or label CTPA or PE phenotypes (Liao et al., 2023). MIMIC-IV-Ext-PE thus uniquely supplies validated, high-specificity PE phenotype annotations at scale and granularity, supporting both cohort identification and benchmarking of further NLP/AI pipelines.

7. Significance and Research Applications

By supplementing MIMIC-IV with nearly 20,000 validated CTPA PE labels, MIMIC-IV-Ext-PE enables rapid assembly of PE-positive/negative cohorts for downstream tasks in thrombosis research, longitudinal prognosis modeling, outcome prediction, pharmacovigilance, and critical-care analytics. The released code and RegEx resources provide reusable templates for cross-dataset phenotyping and can serve as foundational benchmarks for future AI model validation, external comparability, and cross-institutional generalizability analyses (Lam et al., 2024).

Markdown Report Issue Upgrade to Chat

References (3)

MIMIC-IV-Ext-PE: Using a large language model to predict pulmonary embolism phenotype in the MIMIC-IV dataset (2024)

Evaluating Large Language Models for Automated Clinical Abstraction in Pulmonary Embolism Registries: Performance Across Model Sizes, Versions, and Parameters (2025)

A Multidatabase ExTRaction PipEline (METRE) for Facile Cross Validation in Critical Care Research (2023)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to MIMIC-IV-Ext-PE.