CohortGPT: An Enhanced GPT for Participant Recruitment in Clinical Study

Published 21 Jul 2023 in cs.CL and cs.AI | (2307.11346v1)

Abstract: Participant recruitment based on unstructured medical texts such as clinical notes and radiology reports has been a challenging yet important task for the cohort establishment in clinical research. Recently, LLMs such as ChatGPT have achieved tremendous success in various downstream tasks thanks to their promising performance in language understanding, inference, and generation. It is then natural to test their feasibility in solving the cohort recruitment task, which involves the classification of a given paragraph of medical text into disease label(s). However, when applied to knowledge-intensive problem settings such as medical text classification, where the LLMs are expected to understand the decision made by human experts and accurately identify the implied disease labels, the LLMs show a mediocre performance. A possible explanation is that, by only using the medical text, the LLMs neglect to use the rich context of additional information that languages afford. To this end, we propose to use a knowledge graph as auxiliary information to guide the LLMs in making predictions. Moreover, to further boost the LLMs adapt to the problem setting, we apply a chain-of-thought (CoT) sample selection strategy enhanced by reinforcement learning, which selects a set of CoT samples given each individual medical report. Experimental results and various ablation studies show that our few-shot learning method achieves satisfactory performance compared with fine-tuning strategies and gains superb advantages when the available data is limited. The code and sample dataset of the proposed CohortGPT model is available at: https://anonymous.4open.science/r/CohortGPT-4872/

Abstract PDF Upgrade to Chat

Citations (23)

View on Semantic Scholar

Summary

The paper introduces CohortGPT, which integrates LLMs with domain-specific knowledge graphs and reinforcement learning-based chain-of-thought sample selection to improve clinical trial recruitment.
The methodology employs dynamic CoT sample selection and extensive ablation studies, demonstrating superior F1-score performance compared to traditional fine-tuning approaches.
The research highlights practical implications for healthcare, suggesting broader applications of LLMs in areas such as diagnosis and treatment optimization.

CohortGPT: An Enhanced GPT for Participant Recruitment in Clinical Study

The paper "CohortGPT: An Enhanced GPT for Participant Recruitment in Clinical Study" explores the use of LLMs, such as ChatGPT and GPT-4, for recruiting participants in clinical trials by classifying medical texts. The authors propose a framework that combines the language understanding capabilities of LLMs with domain-specific knowledge graphs and a chain-of-thought (CoT) sample selection strategy. In this essay, we will examine the key components of CohortGPT and evaluate its performance in comparison to traditional methods, considering both the practical implications and theoretical contributions of this research.

Introduction to CohortGPT

Randomized clinical trials (RCTs) are fundamental for assessing medical interventions, but participant recruitment remains a significant bottleneck. Medical records often contain unstructured text, such as clinical notes and radiology reports, making it difficult to identify potential candidates who match the study criteria. Traditional methods, including rule-based approaches and machine-learning techniques, have been limited by the complexity of medical language and the need for substantial labeled data for training. CohortGPT addresses these challenges by leveraging the robust language understanding of LLMs augmented with a knowledge graph to guide predictions and a novel reinforcement learning strategy for CoT sample selection.

Components and Methodology

Knowledge Graph Integration

CohortGPT embeds a medical domain knowledge graph into the LLM's prompt design. Knowledge graphs represent relationships between entities in a structured form, enhancing the model's reasoning capabilities within specialized domains. Several strategies for incorporating the knowledge graph into prompts are proposed, including KG-as-Tree, KG-as-Relation, and KG-as-Rule, with KG-as-Rule proving most effective in experiments, demonstrating ease of processing for LLMs.

Figure 1: A knowledge graph was created by ~\cite{zhang2020radiology} to represent relationships between diseases, organs, or tissues.

Reinforcement Learning Enhanced CoT Sample Selection

The selection of CoT samples, critical for guiding the model's reasoning, is optimized using a policy-gradient approach. This strategy addresses the instability of performance with random or similarity-based CoT sample selection by employing a policy neural network. By maximizing a crafted reward function, the dynamic selection strategy aligns sampling decisions with optimal classification outcomes in medical report analysis.

Figure 2: A policy model will be trained on a small number of training samples to dynamically select CoT samples from a CoT candidate pool.

Experimental Evaluation

Performance Metrics

CohortGPT was tested on two prominent medical datasets, IU-RR and MIMIC-CXR, demonstrating its capability to outperform traditional fine-tuning methods in few-shot settings where labeled data is limited. Evaluation metrics included exact match ratio, precision, recall, F1-score, and hamming loss. The results showed that CohortGPT achieved superior performance in F1-score compared to fine-tuned BioBERT and BioGPT models under constrained data scenarios.

Figure 3: Effectiveness of the proposed method against the baseline methods.

Impact of Hyperparameters

Extensive ablation studies were conducted to ascertain the impact of various hyperparameters, such as the number of training samples, CoT candidate samples, and $k$ -shot samples. The findings highlighted the sensitivity of CohortGPT's performance to these parameters, with a notable improvement observed as the number of training samples increased, enhancing the policy model's generalization capacities.

Figure 4: Impact on Number of Training Samples.

Comparative Strategies

Among different CoT selection strategies, the dynamic selection method showed substantial advantages over random, manual, and most-similar sample selection strategies, validating the reinforcement learning approach's efficacy in enhancing model performance through strategic sample selections.

Figure 5: Impact on Number of Candidate Samples.

Figure 6: Impact on Number of $k$ -shot samples.

Implications and Future Directions

CohortGPT represents a significant advancement in the integration of LLMs within healthcare applications, demonstrating potential applications beyond participant recruitment, such as diagnosis and treatment optimization. While the framework utilized proprietary models like ChatGPT, its design also supports deployment with open-source LLMs, broadening its accessibility. Future research could explore extending CohortGPT's methodologies to other areas of healthcare NLP and further refining its reinforcement learning strategies to optimize performance without compromising computational efficiency.

Conclusion

CohortGPT leverages the strengths of LLMs with novel mechanisms for enhancing reasoning and classification tasks in clinical studies. By effectively embedding domain-specific knowledge and dynamically selecting CoT samples, the framework achieves notable performance with minimal data, offering transformative potential for clinical participant recruitment processes and broader applications in medical NLP tasks.

Markdown Report Issue