CLIFT: Analysing Natural Distribution Shift on Question Answering Models in Clinical Domain
Abstract: This paper introduces a new testbed CLIFT (Clinical Shift) for the clinical domain Question-answering task. The testbed includes 7.5k high-quality question answering samples to provide a diverse and reliable benchmark. We performed a comprehensive experimental study and evaluated several QA deep-learning models under the proposed testbed. Despite impressive results on the original test set, the performance degrades when applied to new test sets, which shows the distribution shift. Our findings emphasize the need for and the potential for increasing the robustness of clinical domain models under distributional shifts. The testbed offers one way to track progress in that direction. It also highlights the necessity of adopting evaluation metrics that consider robustness to natural distribution shifts. We plan to expand the corpus by adding more samples and model results. The full paper and the updated benchmark are available at github.com/openlifescience-ai/clift
- Sultan Alrowili and Vijay K. Shanker “BioM-Transformers: Building Large Biomedical Language Models with BERT, ALBERT and ELECTRA” In BIONLP, 2021
- “Publicly Available Clinical BERT Embeddings” In ArXiv abs/1904.03323, 2019
- “Reading Wikipedia to Answer Open-Domain Questions” In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) Association for Computational Linguistics, 2017, pp. 1870–1879 DOI: 10.18653/v1/P17-1171
- “BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding” In Proceedings of the 2019 Conference of the North Association for Computational Linguistics, 2019, pp. 4171–4186 DOI: 10.18653/v1/N19-1423
- “Domain-Specific Language Model Pretraining for Biomedical Natural Language Processing” In ACM Transactions on Computing for Healthcare (HEALTH) 3, 2022, pp. 1–23
- “Don’t Stop Pretraining: Adapt Language Models to Domains and Tasks” In ArXiv abs/2004.10964, 2020
- “Distributed learning: Developing a predictive model based on data from multiple hospitals without data leaving the hospital - A real life proof of concept.” In Radiotherapy and oncology : journal of the European Society for Therapeutic Radiology and Oncology 121 3, 2016, pp. 459–467
- “MIMIC-III, a freely accessible critical care database” In Scientific Data 3, 2016, pp. 160035 DOI: 10.1038/sdata.2016.35
- Madhura Joshi, Ankit Pal and Malaikannan Sankarasubbu “Federated Learning for Healthcare Domain - Pipeline, Applications and Challenges” In ACM Transactions on Computing for Healthcare, 2022
- “BioBERT: a pre-trained biomedical language representation model for biomedical text mining” In Bioinformatics, 2019 DOI: 10.1093/bioinformatics/btz682
- “BioBERT: a pre-trained biomedical language representation model for biomedical text mining” In Bioinformatics 36, 2020, pp. 1234–1240
- “RoBERTa: A Robustly Optimized BERT Pretraining Approach” In ArXiv abs/1907.11692, 2019
- “The Effect of Natural Distribution Shift on Question Answering Models” In Proceedings of the 37th International Conference on Machine Learning 119 PMLR, 2020, pp. 6905–6916 URL: https://proceedings.mlr.press/v119/miller20a.html
- “emrQA: A Large Corpus for Question Answering on Electronic Medical Records” In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing Association for Computational Linguistics, 2018, pp. 2357–2368 DOI: 10.18653/v1/D18-1258
- “Knowledge Graph-based Question Answering with Electronic Health Records” In MLHC, 2021
- “PyTorch: An Imperative Style, High-Performance Deep Learning Library” In ArXiv abs/1912.01703, 2019
- Yifan Peng, Shankai Yan and Zhiyong Lu “Transfer Learning in Biomedical Natural Language Processing: An Evaluation of BERT and ELMo on Ten Benchmarking Datasets” In BioNLP@ACL, 2019
- Pranav Rajpurkar, Robin Jia and Percy Liang “Know What You Don’t Know: Unanswerable Questions for SQuAD” In ACL, 2018
- “SQuAD: 100,000+ Questions for Machine Comprehension of Text” In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing Austin, Texas: Association for Computational Linguistics, 2016, pp. 2383–2392 DOI: 10.18653/v1/D16-1264
- “Entity-Enriched Neural Models for Clinical Question Answering” In Proceedings of the 19th SIGBioMed Workshop on Biomedical Language Processing Association for Computational Linguistics, 2020, pp. 112–122 DOI: 10.18653/v1/2020.bionlp-1.12
- “The future of digital health with federated learning” In NPJ Digital Medicine 3, 2020
- “Data from clinical notes: a perspective on the tension between structure and flexible documentation” In Journal of the American Medical Informatics Association : JAMIA 18 2, 2011, pp. 181–6
- “DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter” In ArXiv abs/1910.01108, 2019
- “Attention is All you Need” In Advances in Neural Information Processing Systems 30 Curran Associates, Inc., 2017 URL: https://proceedings.neurips.cc/paper/2017/file/3f5ee243547dee91fbd053c1c4a845aa-Paper.pdf
- “HuggingFace’s Transformers: State-of-the-art Natural Language Processing” In ArXiv abs/1910.03771, 2019
- “Transformers: State-of-the-Art Natural Language Processing” In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations Association for Computational Linguistics, 2020, pp. 38–45 DOI: 10.18653/v1/2020.emnlp-demos.6
- Xiang Yue, Bernal Jimenez Gutierrez and Huan Sun “Clinical Reading Comprehension: A Thorough Analysis of the emrQA Dataset” In ArXiv abs/2005.00574, 2020
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.