MedOrch: Medical Diagnosis with Tool-Augmented Reasoning Agents for Flexible Extensibility

Published 30 May 2025 in cs.CL | (2506.00235v1)

Abstract: Healthcare decision-making represents one of the most challenging domains for AI, requiring the integration of diverse knowledge sources, complex reasoning, and various external analytical tools. Current AI systems often rely on either task-specific models, which offer limited adaptability, or general LLMs without grounding with specialized external knowledge and tools. We introduce MedOrch, a novel framework that orchestrates multiple specialized tools and reasoning agents to provide comprehensive medical decision support. MedOrch employs a modular, agent-based architecture that facilitates the flexible integration of domain-specific tools without altering the core system. Furthermore, it ensures transparent and traceable reasoning processes, enabling clinicians to meticulously verify each intermediate step underlying the system's recommendations. We evaluate MedOrch across three distinct medical applications: Alzheimer's disease diagnosis, chest X-ray interpretation, and medical visual question answering, using authentic clinical datasets. The results demonstrate MedOrch's competitive performance across these diverse medical tasks. Notably, in Alzheimer's disease diagnosis, MedOrch achieves an accuracy of 93.26%, surpassing the state-of-the-art baseline by over four percentage points. For predicting Alzheimer's disease progression, it attains a 50.35% accuracy, marking a significant improvement. In chest X-ray analysis, MedOrch exhibits superior performance with a Macro AUC of 61.2% and a Macro F1-score of 25.5%. Moreover, in complex multimodal visual question answering (Image+Table), MedOrch achieves an accuracy of 54.47%. These findings underscore MedOrch's potential to advance healthcare AI by enabling reasoning-driven tool utilization for multimodal medical data processing and supporting intricate cognitive tasks in clinical decision-making.

Abstract PDF Upgrade to Chat

Summary

The paper introduces a modular architecture that integrates specialized tool agents for transparent, evidence-based medical diagnosis.
It leverages diverse datasets, achieving 93.26% accuracy in AD diagnosis and competitive scores in chest X-ray and multimodal evaluations.
Its flexible, easily configurable framework enables adaptation to institutional protocols and varied diagnostic scenarios.

MedOrch: Medical Diagnosis with Tool-Augmented Reasoning Agents

MedOrch is a novel framework designed to integrate diverse knowledge sources and reasoning agents to provide comprehensive medical decision support. By employing a modular architecture, MedOrch facilitates the integration of domain-specific tools without altering the core system, ensuring transparent and traceable reasoning processes. This essay explores the core components, performance evaluation across various medical domains, and implications for future developments in AI-driven healthcare.

System Architecture and Core Mechanisms

MedOrch utilizes a modular, agent-based architecture that enables goal-driven multimodal integration. This architecture allows for integrating specialized agents across diverse modalities, such as electronic medical records (EMR), imaging, biomarkers, and literature. The system autonomously decides on the necessary information, decomposes complex problems into subtasks, and invokes appropriate tools to derive conclusions.

Figure 1: The overall architecture of MedOrch. The LLM maintains a chain-of-thought and emits tool-call tokens whenever external evidence is required.

The core reasoning mechanism is a sequence of inference steps that dynamically incorporates tool invocation and external information integration. This process involves generating specialized tool-calling tokens, which signal moments when external information or specialized analysis would strengthen the reasoning process. These tokens are generated through in-context learning based on detailed instructions and examples.

Furthermore, MedOrch's modular architecture enables easy customization through configuration changes, allowing the framework to adapt to institutional protocols, clinical guidelines, or specialized diagnostic workflows without requiring model retraining.

Evaluation in Medical Domains

Alzheimer's Disease Assessment

MedOrch was evaluated using the ADNI dataset, which presents a nuanced challenge involving heterogeneous inputs like demographic variables, cognitive assessments, fluid biomarkers, and brain MRIs. In Alzheimer's diagnosis, MedOrch surpassed the state-of-the-art baseline by over four percentage points, achieving an accuracy of 93.26% for classifying Alzheimer's Disease (AD), Mild Cognitive Impairment (MCI), and Normal Control (NC).

Figure 2: AD Diagnosis

The framework's capability to generate multiple reasoning trajectories was crucial in obtaining high diagnostic accuracy. This multi-answer strategy provides clinicians with several evidence-based perspectives.

Chest X-ray Interpretation

On the MIMIC-CXR dataset, MedOrch demonstrated superior performance with a Macro AUC of 61.2% and a Macro F1-score of 25.5%. This task highlights MedOrch's ability to integrate new image encoders seamlessly without retraining the core reasoning engine.

Medical Visual Question Answering

In the EHRXQA benchmark, which combines structured EHRs with chest X-ray images, MedOrch achieved an accuracy of 54.47% for Image+Table questions. This demonstrates its ability to coordinate multiple domain-specific agents in a mixed-modality workflow, without requiring any handcrafted training corpus or program annotations.

Implications and Future Developments

MedOrch's architecture and capabilities present a significant step forward for medical AI systems. By enabling reasoning-driven tool utilization for multimodal medical data processing, MedOrch supports intricate cognitive tasks in clinical decision-making. The framework's transparency and adaptability enhance trust in AI-driven medical diagnosis, providing a valuable tool for healthcare professionals.

As the system is further developed, it could be extended to broader clinical scenarios and incorporate additional medical modalities. Beyond healthcare, MedOrch's architecture holds potential for adaptation to other complex domains, aligning with the vision of general-purpose AI assistant systems.

Conclusion

MedOrch introduces a robust framework for medical diagnosis, integrating tool-augmented reasoning agents and ensuring transparent decision-making processes. Its modular design supports flexible adaptation to clinical protocols, offering multiple diagnostic trajectories. These capabilities underline MedOrch's promise in advancing healthcare AI and its potential for future applications across various domains.

Markdown Report Issue