- The paper introduces a modular architecture that integrates specialized tool agents for transparent, evidence-based medical diagnosis.
- It leverages diverse datasets, achieving 93.26% accuracy in AD diagnosis and competitive scores in chest X-ray and multimodal evaluations.
- Its flexible, easily configurable framework enables adaptation to institutional protocols and varied diagnostic scenarios.
MedOrch is a novel framework designed to integrate diverse knowledge sources and reasoning agents to provide comprehensive medical decision support. By employing a modular architecture, MedOrch facilitates the integration of domain-specific tools without altering the core system, ensuring transparent and traceable reasoning processes. This essay explores the core components, performance evaluation across various medical domains, and implications for future developments in AI-driven healthcare.
System Architecture and Core Mechanisms
MedOrch utilizes a modular, agent-based architecture that enables goal-driven multimodal integration. This architecture allows for integrating specialized agents across diverse modalities, such as electronic medical records (EMR), imaging, biomarkers, and literature. The system autonomously decides on the necessary information, decomposes complex problems into subtasks, and invokes appropriate tools to derive conclusions.
Figure 1: The overall architecture of MedOrch. The LLM maintains a chain-of-thought and emits tool-call tokens whenever external evidence is required.
The core reasoning mechanism is a sequence of inference steps that dynamically incorporates tool invocation and external information integration. This process involves generating specialized tool-calling tokens, which signal moments when external information or specialized analysis would strengthen the reasoning process. These tokens are generated through in-context learning based on detailed instructions and examples.
Furthermore, MedOrch's modular architecture enables easy customization through configuration changes, allowing the framework to adapt to institutional protocols, clinical guidelines, or specialized diagnostic workflows without requiring model retraining.
Evaluation in Medical Domains
Alzheimer's Disease Assessment
MedOrch was evaluated using the ADNI dataset, which presents a nuanced challenge involving heterogeneous inputs like demographic variables, cognitive assessments, fluid biomarkers, and brain MRIs. In Alzheimer's diagnosis, MedOrch surpassed the state-of-the-art baseline by over four percentage points, achieving an accuracy of 93.26% for classifying Alzheimer's Disease (AD), Mild Cognitive Impairment (MCI), and Normal Control (NC).



Figure 2: AD Diagnosis
The framework's capability to generate multiple reasoning trajectories was crucial in obtaining high diagnostic accuracy. This multi-answer strategy provides clinicians with several evidence-based perspectives.
Chest X-ray Interpretation
On the MIMIC-CXR dataset, MedOrch demonstrated superior performance with a Macro AUC of 61.2% and a Macro F1-score of 25.5%. This task highlights MedOrch's ability to integrate new image encoders seamlessly without retraining the core reasoning engine.
Medical Visual Question Answering
In the EHRXQA benchmark, which combines structured EHRs with chest X-ray images, MedOrch achieved an accuracy of 54.47% for Image+Table questions. This demonstrates its ability to coordinate multiple domain-specific agents in a mixed-modality workflow, without requiring any handcrafted training corpus or program annotations.
Implications and Future Developments
MedOrch's architecture and capabilities present a significant step forward for medical AI systems. By enabling reasoning-driven tool utilization for multimodal medical data processing, MedOrch supports intricate cognitive tasks in clinical decision-making. The framework's transparency and adaptability enhance trust in AI-driven medical diagnosis, providing a valuable tool for healthcare professionals.
As the system is further developed, it could be extended to broader clinical scenarios and incorporate additional medical modalities. Beyond healthcare, MedOrch's architecture holds potential for adaptation to other complex domains, aligning with the vision of general-purpose AI assistant systems.
Conclusion
MedOrch introduces a robust framework for medical diagnosis, integrating tool-augmented reasoning agents and ensuring transparent decision-making processes. Its modular design supports flexible adaptation to clinical protocols, offering multiple diagnostic trajectories. These capabilities underline MedOrch's promise in advancing healthcare AI and its potential for future applications across various domains.