- The paper presents a multi-agent system with three LLM agents that optimize doctor-patient communication via prompt feedback along 17 quality axes.
- It employs structured evaluation metrics and DSPy prompt optimization techniques with open-weight multilingual models to enhance response presentation.
- Live deployment in a Romanian telemedicine platform resulted in a 70.22% increase in positive patient reviews, indicating significant improvements in communication quality.
Dr.Copilot: A Multi-Agent Prompt Optimized Assistant for Improving Patient-Doctor Communication in Romanian
The paper "Dr.Copilot: A Multi-Agent Prompt Optimized Assistant for Improving Patient-Doctor Communication in Romanian" addresses the intricacies of enhancing text-based telemedicine, specifically within Romanian-speaking environments. It navigates the challenges of ensuring quality communication between doctors and patients, underscoring the necessity for effective information presentation without undermining medical accuracy.
Introduction to Dr.Copilot
Dr.Copilot is proposed as a multi-agent system that comprises three LLM agents to support Romanian-speaking doctors in refining their communication with patients. This tool focuses on enhancing the presentation quality of medical advice rather than assessing its clinical content. With feedback provided along 17 interpretable axes, the system optimizes prompts via DSPy to deliver real-time constructive feedback. Dr.Copilot aligns with the surge in telemedicine by aiming to translate effective doctor-patient interaction into increased patient satisfaction and improved business outcomes for telemedicine platforms.
Methodological Framework
Metrics for Communication Quality
Dr.Copilot's methodology initiates with devising structured metrics to evaluate the quality of doctor responses. These metrics draw from dialogue evaluation literature and industry-specific needs, covering aspects such as empathy, addressal of patient concerns, and grammatical correctness. By selecting a dataset of 100 annotated question-response pairs, the system provides a foundation for optimizing prompt-based interactions.
Multi-Agent System Design
The system is composed of three agents:
- Scoring Agent: Responsible for evaluating doctor responses based on predefined quality metrics. Optimized using various DSPy prompt optimization techniques such as Labeled Few-Shot and SIMBA, the Scoring Agent ensures accurate assessments.
Recommender Agent: Based on the scores, this agent generates tailored suggestions for enhancing the presentation quality of responses.
Reconciliation Agent: This agent revises the original doctor responses by integrating recommendations and rescores the response to verify the enhancements in quality measures.
Practical Implementation
Given the sensitive nature of the medical domain and data privacy concerns in Romania, the system utilizes open-weight multilingual models from the Google Gemma model family, including Gemma 12B, Gemma 27B, and MedGemma-27B, which are deployed locally.
Evaluation and Results
The performance of the Scoring Agent was compared across different models and prompt optimizers using Pearson Correlations and F1 scores, yielding the best results with the SIMBA-Optimized MedGemma-27B. The Self-Evaluation Procedure, involving a Reconciliation Agent for recommendation validity assessment, demonstrated significant enhancements in response scores post-optimization.
Live Deployment Outcomes
Dr.Copilot was deployed in a real-world Romanian telemedicine platform with 41 doctors participating in a live trial. The system processed 212 evaluation requests and generated 449 improvement recommendations. During the evaluation period, Dr.Copilot contributed to a 70.22% increase in positive patient reviews when its suggestions were incorporated into doctors' responses.
Conclusion
Dr.Copilot signifies a key advancement in the implementation of LLMs within the healthcare sector, especially in resource-constrained language environments like Romanian. By focusing exclusively on aiding medical professionals to refine the presentation of patient communications, it has successfully driven measurable improvements in user satisfaction and set a precedent for future applications of LLMs in underrepresented languages. The approach holds implications for advancing multilingual LLMs in clinical settings, offering guidance for delivering patient-centered communication without directly engaging in medical recommendations.
Overall, this paper provides a detailed architecture for developing LLM-based assistance tools in telemedicine, highlighting their potential in enhancing healthcare delivery and patient satisfaction, particularly within low-resource language communities. Future research could extend the models and methodologies to other underrepresented languages, optimizing healthcare communication on a global scale.