Reliability of TICL and retrieval‑augmented TICL in high‑dimensional, imbalanced EHRs

Determine the reliability and behavioral characteristics of tabular in‑context learning and retrieval‑augmented tabular in‑context learning methods, including PFN‑based approaches such as TabPFN and TabDPT, when applied to high‑dimensional, sparse, and imbalanced structured electronic health record representations in which retrieval‑based context construction may be noisy or unstable.

Background

Tabular in‑context learning (TICL) methods such as TabPFN and their retrieval‑augmented variants promise retraining‑free adaptation for clinical prediction on structured EHRs, but clinical data exhibit high feature heterogeneity, sparsity, and severe class imbalance.

Because retrieval constructs the in‑context prompt at inference time, noisy or misaligned neighborhoods in high‑dimensional EHR spaces may undermine stability and performance. The paper highlights that how reliably these methods behave under such conditions is not yet established, motivating systematic evaluation and methodological advances like AWARE.

References

In particular, it remains unclear how reliably TICL and retrieval-augmented TICL behave in high-dimensional, sparse, and imbalanced EHR representations where retrieval-based context construction maybe noisy or unstable.