MIMIC-IV-to-FHIR Reference Mappings

Updated 16 January 2026

MIMIC-IV-on-FHIR is a framework that systematically transforms structured clinical data into HL7 FHIR resources, ensuring semantic interoperability.
It employs advanced retrieval-augmented generation and context-aware prompt engineering to achieve high accuracy in resource mapping.
Evaluation metrics demonstrate high precision and recall under baseline conditions, underscoring the method's reliability and potential for further fine-tuning.

MIMIC-IV-on-FHIR reference mappings define the systematic transformation of structured clinical data from the MIMIC-IV database into HL7 FHIR-compliant resources. This mapping framework enables semantic interoperability and supports automation via LLMs. It incorporates rigorous attribute-level and terminology normalization protocols, leverages context-aware prompt engineering, and is validated against formal evaluation metrics (Riquelme et al., 3 Jul 2025, Brens et al., 9 Jan 2026).

1. Mapping Principles and Pipeline Architecture

The MIMIC-IV-on-FHIR mapping pipeline is a semi-automated process executed in sequential stages: data processing, context building, and targeted LLM prompting. Both baseline (schema-aware) and real-world (minimal context) scenarios are supported.

Data Processing: In the baseline, 17 MIMIC-IV tables (183 attributes) are filtered to 119 candidate attributes. In the real-world configuration, a single table contains 68 unconstrained attributes with only basic metadata.
Context Building:
- Retrieval-Augmented Generation (RAG) combines embeddings from TF-IDF, BM25, Universal Sentence Encoder, and Word2Vec for semantic similarity between MIMIC-IV and 45 official FHIR resources.
- Cosine similarity and Reciprocal Rank Fusion prioritize resource selection: top-1 FHIR resource assignment achieves 100% accuracy in the baseline.
- Unsupervised clustering (KMeans, Silhouette/Calinski-Harabasz/Davies-Bouldin) with biomedical embeddings (PubMedBERT, MedEmbed-v0.1, ClinicalBERT, BioBERT) is applied for real-world attribute grouping; top-5 resource recall is 94%.
LLM Interaction:
- Self-reflexive, mixture-of-prompts, and 5-step serial prompting strategies are engineered for resource-element mapping.
- GPT-4o and Llama 3.2 models are configured for deterministic output (temperature=0, top_p=0), utilizing "functions" and "structured_output" interfaces to enforce schema-compliance and to invoke resource-specific tools (Riquelme et al., 3 Jul 2025).

2. Reference Mapping Tables: Source-to-FHIR Alignment

Explicit tabular mappings provide direct references for implementers. Attribute-level transformations specify the target FHIR resource, element path, data type, and normalization mechanism:

MIMIC-IV Table.Field	FHIR Resource	FHIR Element Path
PATIENTS.subject_id	Patient	Patient.identifier[0].value
PATIENTS.gender	Patient	Patient.gender
PATIENTS.dob	Patient	Patient.birthDate
ADMISSIONS.hadm_id	Encounter	Encounter.identifier[0].value
ADMISSIONS.subject_id	Encounter	Encounter.subject.reference
ADMISSIONS.admittime	Encounter	Encounter.period.start
ADMISSIONS.dischtime	Encounter	Encounter.period.end
ADMISSIONS.admission_type	Encounter	Encounter.class.code
DIAGNOSES_ICD.icd_code	Condition	Condition.code.coding.code
DIAGNOSES_ICD.icd_code	Condition	Condition.code.coding.system
DIAGNOSES_ICD.long_title	Condition	Condition.code.coding.display
DIAGNOSES_ICD.hadm_id	Condition	Condition.encounter.reference
DIAGNOSES_ICD.subject_id	Condition	Condition.subject.reference
LABEVENTS.itemid	Observation	Observation.code.coding.code
LABEVENTS.itemid	Observation	Observation.code.coding.system
LABEVENTS.valuenum	Observation	Observation.valueQuantity.value
LABEVENTS.valueuom	Observation	Observation.valueQuantity.unit
LABEVENTS.charttime	Observation	Observation.effectiveDateTime
PRESCRIPTIONS.drug_code_rxnorm	MedicationRequest	MedicationRequest.medicationCodeableConcept.code
PRESCRIPTIONS.drug_code_rxnorm	MedicationRequest	MedicationRequest.medicationCodeableConcept.system
PRESCRIPTIONS.drug	MedicationRequest	MedicationRequest.dosageInstruction.text

Each mapping encodes normalization rules: for example, DIAGNOSES_ICD.icd_code is mapped to SNOMED-CT via UMLS CUI lookup and contextual embedding similarity (SapBERT); LABEVENTS.itemid utilizes an official LOINC crosswalk; PRESCRIPTIONS.drug_code_rxnorm is injected directly (Brens et al., 9 Jan 2026).

3. Terminology Normalization and Transformation Functions

Terminology harmonization is critical for semantic interoperability across standards:

Diagnosis:
- ICD-9/10 codes from DIAGNOSES_ICD are mapped to SNOMED-CT using UMLS CUI lookup supplemented with SapBERT embedding similarity:
$\mathrm{f_{cond}: D.\text{icd\_code} \mapsto \mathrm{Condition.code.coding.code}}, \quad \mathrm{Condition.code.coding.system} = \mathrm{"http://snomed.info/sct"}$

$\mathrm{f_{snomed}(d) = \arg\max_{s \in \mathcal{S}(d)} \cos(\mathbf{e}_{\mathrm{text}(d)}, \mathbf{e}_{\mathrm{desc}(s)})}$

where $\mathcal{S}(d)$ is the set of SNOMED codes sharing a UMLS CUI with ICD code $d$ .
Lab Observations: ITEMID is mapped to LOINC code via the LOINC-MIMIC crosswalk:

$\mathrm{f_{obs}: L.\text{itemid} \mapsto \mathrm{Observation.code.coding.code}}, \quad \mathrm{Observation.code.coding.system} = \mathrm{"http://loinc.org"}$

$\mathrm{Observation.valueQuantity.value} = L.\text{valuenum}, \quad \mathrm{Observation.valueQuantity.unit} = L.\text{valueuom}$
Medications: RxNorm is used directly with all required URIs.

4. Prompt Engineering and Automated Mapping Strategies

LLM-based mapping employs several prompt paradigms, each contributing distinct accuracy-effect profiles.

Self-Reflexive Prompt: The model performs initial mapping using FHIR JSON schemas, then internally revises output for consistency.
Mixture-of-Prompts (MoP): Alternates between direct column-to-element mapping, value-driven resource alignment, and FHIR-URL-based definitions.
5-Step Serial Prompt: Guides the model through staged identification, table intent summarization, schema provisioning, mapping, and output validation.
OpenAI "functions" and "structured_output" modes inject FHIR JSON schemas, enabling strict conformance. The parameter function_call="auto" invokes appropriate mapping tools.
Determinism is enforced using temperature=0; real-world tests vary temperature between 0, 0.5, and 1 to stress mapping resilience (Riquelme et al., 3 Jul 2025).

5. Evaluation Metrics and Empirical Validation

Performance assessment utilizes both resource-level and attribute-level metrics:

Definitions:
- Precision: $TP/(TP+FP)$
- Recall: $TP/(TP+FN)$
- F1-score: $2 \cdot Precision \cdot Recall /(Precision + Recall)$
- Accuracy: $(TP+TN)/(TP+TN+FP+FN)$
Results:
- Baseline resource identification: Perfect F1=1.00.
- Attribute-level mapping:
- GPT-4o, Self-Reflexive: 67.02%, 73.88%
- GPT-4o, MoP: [64.50%, 70.89%]
- Llama 3.2, MoP: [43.79%, 52.98%]
- Llama 3.2, Serial Schema: [28.49%, 35.62%]
- Real-world attribute-level mapping (N=10 runs, temperature=0,0.5,1):
- GPT-4o: 68.2–68.8%, Llama 3.2: 51.6–56.1% (Riquelme et al., 3 Jul 2025).

This suggests substantial gains for schema-aware instruction and prompt diversification. A plausible implication is that further fine-tuning on FHIR-specific sources could increase LLM mapping fidelity.

6. Error Analysis, Mitigation, and Recommendations

Error analysis reveals recurring LLM failure modes:

Hallucinated Attributes: Models invent plausible but non-existent fields.
Granularity Mismatch: Source columns are mapped to overly general or overly specific FHIR elements.
Insufficient Context: Abbreviated or ambiguous column names yield incorrect mappings.

Mitigation strategies demonstrate efficacy:

Structured JSON schemas (in-prompt) and "functions" reduce hallucinations.
Self-reflexive prompts support automatic internal correction.
Mixed and sample-value-enhanced prompts clarify ambiguous cases.
Out-of-the-box open-source models underperform; fine-tuning and interactive expert interfaces are recommended for validation and iterative improvement (Riquelme et al., 3 Jul 2025).

7. Implementation Guidelines and Future Directions

Adherence to reference mappings enables reproducibility and extension to additional standards:

Expand to HL7 CDA, OMOP CDM, and openEHR support via the same semi-automated workflow.
Fine-tune open-source LLMs on FHIR implementation guides and US-Core profiles.
Integrate centralized terminology servers (SNOMED CT, LOINC) for dynamic code alignment.
Develop interactive GUIs for manual mapping validation and feedback-driven prompt refinement.
Benchmark lightweight, privacy-preserving models for on-premise deployments.
Implement RAG-based prompt generation for real-time FHIR fragment retrieval.

With the canonical mapping tables, prompt templates, normalization functions, and evaluation strategies, clinical informaticists and standards engineers can reliably effect MIMIC-IV-to-FHIR transformation, ensuring high-confidence semantic interoperability (Riquelme et al., 3 Jul 2025, Brens et al., 9 Jan 2026).

Markdown Report Issue Upgrade to Chat

References (2)

Large Language Models for Automating Clinical Data Standardization: HL7 FHIR Use Case (2025)

Semantic NLP Pipelines for Interoperable Patient Digital Twins from Unstructured EHRs (2026)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to MIMIC-IV-on-FHIR Reference Mappings.