Papers
Topics
Authors
Recent
Search
2000 character limit reached

MIMIC-IV-to-FHIR Reference Mappings

Updated 16 January 2026
  • MIMIC-IV-on-FHIR is a framework that systematically transforms structured clinical data into HL7 FHIR resources, ensuring semantic interoperability.
  • It employs advanced retrieval-augmented generation and context-aware prompt engineering to achieve high accuracy in resource mapping.
  • Evaluation metrics demonstrate high precision and recall under baseline conditions, underscoring the method's reliability and potential for further fine-tuning.

MIMIC-IV-on-FHIR reference mappings define the systematic transformation of structured clinical data from the MIMIC-IV database into HL7 FHIR-compliant resources. This mapping framework enables semantic interoperability and supports automation via LLMs. It incorporates rigorous attribute-level and terminology normalization protocols, leverages context-aware prompt engineering, and is validated against formal evaluation metrics (Riquelme et al., 3 Jul 2025, Brens et al., 9 Jan 2026).

1. Mapping Principles and Pipeline Architecture

The MIMIC-IV-on-FHIR mapping pipeline is a semi-automated process executed in sequential stages: data processing, context building, and targeted LLM prompting. Both baseline (schema-aware) and real-world (minimal context) scenarios are supported.

  • Data Processing: In the baseline, 17 MIMIC-IV tables (183 attributes) are filtered to 119 candidate attributes. In the real-world configuration, a single table contains 68 unconstrained attributes with only basic metadata.
  • Context Building:
    • Retrieval-Augmented Generation (RAG) combines embeddings from TF-IDF, BM25, Universal Sentence Encoder, and Word2Vec for semantic similarity between MIMIC-IV and 45 official FHIR resources.
    • Cosine similarity and Reciprocal Rank Fusion prioritize resource selection: top-1 FHIR resource assignment achieves 100% accuracy in the baseline.
    • Unsupervised clustering (KMeans, Silhouette/Calinski-Harabasz/Davies-Bouldin) with biomedical embeddings (PubMedBERT, MedEmbed-v0.1, ClinicalBERT, BioBERT) is applied for real-world attribute grouping; top-5 resource recall is 94%.
  • LLM Interaction:
    • Self-reflexive, mixture-of-prompts, and 5-step serial prompting strategies are engineered for resource-element mapping.
    • GPT-4o and Llama 3.2 models are configured for deterministic output (temperature=0, top_p=0), utilizing "functions" and "structured_output" interfaces to enforce schema-compliance and to invoke resource-specific tools (Riquelme et al., 3 Jul 2025).

2. Reference Mapping Tables: Source-to-FHIR Alignment

Explicit tabular mappings provide direct references for implementers. Attribute-level transformations specify the target FHIR resource, element path, data type, and normalization mechanism:

MIMIC-IV Table.Field FHIR Resource FHIR Element Path
PATIENTS.subject_id Patient Patient.identifier[0].value
PATIENTS.gender Patient Patient.gender
PATIENTS.dob Patient Patient.birthDate
ADMISSIONS.hadm_id Encounter Encounter.identifier[0].value
ADMISSIONS.subject_id Encounter Encounter.subject.reference
ADMISSIONS.admittime Encounter Encounter.period.start
ADMISSIONS.dischtime Encounter Encounter.period.end
ADMISSIONS.admission_type Encounter Encounter.class.code
DIAGNOSES_ICD.icd_code Condition Condition.code.coding.code
DIAGNOSES_ICD.icd_code Condition Condition.code.coding.system
DIAGNOSES_ICD.long_title Condition Condition.code.coding.display
DIAGNOSES_ICD.hadm_id Condition Condition.encounter.reference
DIAGNOSES_ICD.subject_id Condition Condition.subject.reference
LABEVENTS.itemid Observation Observation.code.coding.code
LABEVENTS.itemid Observation Observation.code.coding.system
LABEVENTS.valuenum Observation Observation.valueQuantity.value
LABEVENTS.valueuom Observation Observation.valueQuantity.unit
LABEVENTS.charttime Observation Observation.effectiveDateTime
PRESCRIPTIONS.drug_code_rxnorm MedicationRequest MedicationRequest.medicationCodeableConcept.code
PRESCRIPTIONS.drug_code_rxnorm MedicationRequest MedicationRequest.medicationCodeableConcept.system
PRESCRIPTIONS.drug MedicationRequest MedicationRequest.dosageInstruction.text

Each mapping encodes normalization rules: for example, DIAGNOSES_ICD.icd_code is mapped to SNOMED-CT via UMLS CUI lookup and contextual embedding similarity (SapBERT); LABEVENTS.itemid utilizes an official LOINC crosswalk; PRESCRIPTIONS.drug_code_rxnorm is injected directly (Brens et al., 9 Jan 2026).

3. Terminology Normalization and Transformation Functions

Terminology harmonization is critical for semantic interoperability across standards:

  • Diagnosis:
    • ICD-9/10 codes from DIAGNOSES_ICD are mapped to SNOMED-CT using UMLS CUI lookup supplemented with SapBERT embedding similarity:

    fcond:D.icd_codeCondition.code.coding.code,Condition.code.coding.system="http://snomed.info/sct"\mathrm{f_{cond}: D.\text{icd\_code} \mapsto \mathrm{Condition.code.coding.code}}, \quad \mathrm{Condition.code.coding.system} = \mathrm{"http://snomed.info/sct"}

    fsnomed(d)=argmaxsS(d)cos(etext(d),edesc(s))\mathrm{f_{snomed}(d) = \arg\max_{s \in \mathcal{S}(d)} \cos(\mathbf{e}_{\mathrm{text}(d)}, \mathbf{e}_{\mathrm{desc}(s)})}

    where S(d)\mathcal{S}(d) is the set of SNOMED codes sharing a UMLS CUI with ICD code dd.

  • Lab Observations: ITEMID is mapped to LOINC code via the LOINC-MIMIC crosswalk:

    fobs:L.itemidObservation.code.coding.code,Observation.code.coding.system="http://loinc.org"\mathrm{f_{obs}: L.\text{itemid} \mapsto \mathrm{Observation.code.coding.code}}, \quad \mathrm{Observation.code.coding.system} = \mathrm{"http://loinc.org"}

    Observation.valueQuantity.value=L.valuenum,Observation.valueQuantity.unit=L.valueuom\mathrm{Observation.valueQuantity.value} = L.\text{valuenum}, \quad \mathrm{Observation.valueQuantity.unit} = L.\text{valueuom}

  • Medications: RxNorm is used directly with all required URIs.

4. Prompt Engineering and Automated Mapping Strategies

LLM-based mapping employs several prompt paradigms, each contributing distinct accuracy-effect profiles.

  • Self-Reflexive Prompt: The model performs initial mapping using FHIR JSON schemas, then internally revises output for consistency.

  • Mixture-of-Prompts (MoP): Alternates between direct column-to-element mapping, value-driven resource alignment, and FHIR-URL-based definitions.

  • 5-Step Serial Prompt: Guides the model through staged identification, table intent summarization, schema provisioning, mapping, and output validation.

  • OpenAI "functions" and "structured_output" modes inject FHIR JSON schemas, enabling strict conformance. The parameter function_call="auto" invokes appropriate mapping tools.

  • Determinism is enforced using temperature=0; real-world tests vary temperature between 0, 0.5, and 1 to stress mapping resilience (Riquelme et al., 3 Jul 2025).

5. Evaluation Metrics and Empirical Validation

Performance assessment utilizes both resource-level and attribute-level metrics:

  • Definitions:

    • Precision: TP/(TP+FP)TP/(TP+FP)
    • Recall: TP/(TP+FN)TP/(TP+FN)
    • F1-score: 2PrecisionRecall/(Precision+Recall)2 \cdot Precision \cdot Recall /(Precision + Recall)
    • Accuracy: (TP+TN)/(TP+TN+FP+FN)(TP+TN)/(TP+TN+FP+FN)
  • Results:
    • Baseline resource identification: Perfect F1=1.00.
    • Attribute-level mapping:
    • GPT-4o, Self-Reflexive: 67.02%, 73.88%
    • GPT-4o, MoP: [64.50%, 70.89%]
    • Llama 3.2, MoP: [43.79%, 52.98%]
    • Llama 3.2, Serial Schema: [28.49%, 35.62%]
    • Real-world attribute-level mapping (N=10 runs, temperature=0,0.5,1):
    • GPT-4o: 68.2–68.8%, Llama 3.2: 51.6–56.1% (Riquelme et al., 3 Jul 2025).

This suggests substantial gains for schema-aware instruction and prompt diversification. A plausible implication is that further fine-tuning on FHIR-specific sources could increase LLM mapping fidelity.

6. Error Analysis, Mitigation, and Recommendations

Error analysis reveals recurring LLM failure modes:

  • Hallucinated Attributes: Models invent plausible but non-existent fields.
  • Granularity Mismatch: Source columns are mapped to overly general or overly specific FHIR elements.
  • Insufficient Context: Abbreviated or ambiguous column names yield incorrect mappings.

Mitigation strategies demonstrate efficacy:

  • Structured JSON schemas (in-prompt) and "functions" reduce hallucinations.
  • Self-reflexive prompts support automatic internal correction.
  • Mixed and sample-value-enhanced prompts clarify ambiguous cases.
  • Out-of-the-box open-source models underperform; fine-tuning and interactive expert interfaces are recommended for validation and iterative improvement (Riquelme et al., 3 Jul 2025).

7. Implementation Guidelines and Future Directions

Adherence to reference mappings enables reproducibility and extension to additional standards:

  • Expand to HL7 CDA, OMOP CDM, and openEHR support via the same semi-automated workflow.
  • Fine-tune open-source LLMs on FHIR implementation guides and US-Core profiles.
  • Integrate centralized terminology servers (SNOMED CT, LOINC) for dynamic code alignment.
  • Develop interactive GUIs for manual mapping validation and feedback-driven prompt refinement.
  • Benchmark lightweight, privacy-preserving models for on-premise deployments.
  • Implement RAG-based prompt generation for real-time FHIR fragment retrieval.

With the canonical mapping tables, prompt templates, normalization functions, and evaluation strategies, clinical informaticists and standards engineers can reliably effect MIMIC-IV-to-FHIR transformation, ensuring high-confidence semantic interoperability (Riquelme et al., 3 Jul 2025, Brens et al., 9 Jan 2026).

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to MIMIC-IV-on-FHIR Reference Mappings.