Dr.Copilot: A Multi-Agent Prompt Optimized Assistant for Improving Patient-Doctor Communication in Romanian

Published 15 Jul 2025 in cs.CL | (2507.11299v2)

Abstract: Text-based telemedicine has become increasingly common, yet the quality of medical advice in doctor-patient interactions is often judged more on how advice is communicated rather than its clinical accuracy. To address this, we introduce Dr. Copilot , a multi-agent LLM system that supports Romanian-speaking doctors by evaluating and enhancing the presentation quality of their written responses. Rather than assessing medical correctness, Dr. Copilot provides feedback along 17 interpretable axes. The system comprises of three LLM agents with prompts automatically optimized via DSPy. Designed with low-resource Romanian data and deployed using open-weight models, it delivers real-time specific feedback to doctors within a telemedicine platform. Empirical evaluations and live deployment with 41 doctors show measurable improvements in user reviews and response quality, marking one of the first real-world deployments of LLMs in Romanian medical settings.

Abstract PDF Upgrade to Chat

Authors (4)

Summary

The paper presents a multi-agent system with three LLM agents that optimize doctor-patient communication via prompt feedback along 17 quality axes.
It employs structured evaluation metrics and DSPy prompt optimization techniques with open-weight multilingual models to enhance response presentation.
Live deployment in a Romanian telemedicine platform resulted in a 70.22% increase in positive patient reviews, indicating significant improvements in communication quality.

Dr.Copilot: A Multi-Agent Prompt Optimized Assistant for Improving Patient-Doctor Communication in Romanian

The paper "Dr.Copilot: A Multi-Agent Prompt Optimized Assistant for Improving Patient-Doctor Communication in Romanian" addresses the intricacies of enhancing text-based telemedicine, specifically within Romanian-speaking environments. It navigates the challenges of ensuring quality communication between doctors and patients, underscoring the necessity for effective information presentation without undermining medical accuracy.

Introduction to Dr.Copilot

Dr.Copilot is proposed as a multi-agent system that comprises three LLM agents to support Romanian-speaking doctors in refining their communication with patients. This tool focuses on enhancing the presentation quality of medical advice rather than assessing its clinical content. With feedback provided along 17 interpretable axes, the system optimizes prompts via DSPy to deliver real-time constructive feedback. Dr.Copilot aligns with the surge in telemedicine by aiming to translate effective doctor-patient interaction into increased patient satisfaction and improved business outcomes for telemedicine platforms.

Methodological Framework

Metrics for Communication Quality

Dr.Copilot's methodology initiates with devising structured metrics to evaluate the quality of doctor responses. These metrics draw from dialogue evaluation literature and industry-specific needs, covering aspects such as empathy, addressal of patient concerns, and grammatical correctness. By selecting a dataset of 100 annotated question-response pairs, the system provides a foundation for optimizing prompt-based interactions.

Multi-Agent System Design

The system is composed of three agents:

Scoring Agent: Responsible for evaluating doctor responses based on predefined quality metrics. Optimized using various DSPy prompt optimization techniques such as Labeled Few-Shot and SIMBA, the Scoring Agent ensures accurate assessments.
Recommender Agent: Based on the scores, this agent generates tailored suggestions for enhancing the presentation quality of responses.
Reconciliation Agent: This agent revises the original doctor responses by integrating recommendations and rescores the response to verify the enhancements in quality measures.

Practical Implementation

Given the sensitive nature of the medical domain and data privacy concerns in Romania, the system utilizes open-weight multilingual models from the Google Gemma model family, including Gemma 12B, Gemma 27B, and MedGemma-27B, which are deployed locally.

Evaluation and Results

The performance of the Scoring Agent was compared across different models and prompt optimizers using Pearson Correlations and F1 scores, yielding the best results with the SIMBA-Optimized MedGemma-27B. The Self-Evaluation Procedure, involving a Reconciliation Agent for recommendation validity assessment, demonstrated significant enhancements in response scores post-optimization.

Live Deployment Outcomes

Dr.Copilot was deployed in a real-world Romanian telemedicine platform with 41 doctors participating in a live trial. The system processed 212 evaluation requests and generated 449 improvement recommendations. During the evaluation period, Dr.Copilot contributed to a 70.22% increase in positive patient reviews when its suggestions were incorporated into doctors' responses.

Conclusion

Dr.Copilot signifies a key advancement in the implementation of LLMs within the healthcare sector, especially in resource-constrained language environments like Romanian. By focusing exclusively on aiding medical professionals to refine the presentation of patient communications, it has successfully driven measurable improvements in user satisfaction and set a precedent for future applications of LLMs in underrepresented languages. The approach holds implications for advancing multilingual LLMs in clinical settings, offering guidance for delivering patient-centered communication without directly engaging in medical recommendations.

Overall, this paper provides a detailed architecture for developing LLM-based assistance tools in telemedicine, highlighting their potential in enhancing healthcare delivery and patient satisfaction, particularly within low-resource language communities. Future research could extend the models and methodologies to other underrepresented languages, optimizing healthcare communication on a global scale.

Markdown Report Issue