- The paper introduces DeepRare, which generates a ranked list of rare disease hypotheses with evidence-based reasoning.
- It utilizes a tiered architecture combining a long-term memory host, specialized agent servers, and external data sources to process clinical inputs through over 40 tools.
- It achieved remarkable diagnostic performance, including 70.60% Recall@1 for multi-modal cases and 95.40% agreement on reasoning chains by clinical experts.
An Agentic System for Rare Disease Diagnosis with Traceable Reasoning
The paper introduces DeepRare, an innovative agentic system specifically designed for rare disease diagnosis, leveraging a LLM to process complex clinical inputs, including free-text descriptions, Human Phenotype Ontology (HPO) terms, and genomic variants. DeepRare aims to generate a ranked list of diagnostic hypotheses for rare diseases while providing a transparent reasoning chain that links intermediate analytic steps to verifiable medical evidence. This clarity of diagnostics is essential for clinical adoption, enabling human-AI collaboration in diagnostic workflows.
System Architecture and Workflow
DeepRare employs a tiered architecture comprising three key components: a central host with a long-term memory module, specialized agent servers, and extensive external data sources. The central host is responsible for orchestrating diagnostic processes and integrating collected evidence into a coherent context. Specialized agent servers handle domain-specific analytical tasks such as phenotype extraction and variant prioritization, utilizing over 40 tools and up-to-date medical knowledge sources. This modular design promotes complex diagnostic reasoning while ensuring traceability and adaptability.
Figure 1: DeepRare: An agentic framework for rare disease prioritization. (a) System workflow: Multi-modal patient data (HPO terms, genomic variants) are processed through a tiered MCP-inspired architecture, generating a ranked Top-K diagnosis list with evidence-supported reasoning chains. (b) Knowledge architecture: Sunburst visualization depicting hierarchical integration of diagnostic tools and biomedical knowledge sources within DeepRare. (c) Multi-center benchmark characteristics: Case distributions, phenotypic complexity (HPO metrics), disease spectrum, provenance, and genetic annotation status (solid: confirmed pathogenic variants; half-solid: candidate variants extracted; hollow: no genetic data). (d) Performance benchmarking: Comparative evaluation across diagnostic APIs, general-purpose LLMs, reasoning-enhanced LLMs, medically-tuned LLMs, and agentic systems.
DeepRare was evaluated across eight datasets from Asia, North America, and Europe, involving 2,919 diseases spanning specialties such as neurology and genetics. The system achieved remarkable diagnostic performance, with 100% accuracy for 1,013 diseases and significant outperformance over 15 comparative methods, including traditional bioinformatics tools and other LLM-based systems. In HPO-based evaluations, DeepRare's average Recall@1 score of 57.18% exceeded the second-best method by a significant margin of 23.79 percentage points. Under multi-modal input scenarios, DeepRare achieved 70.60% at Recall@1, outperforming Exomiser's 53.20% in 109 cases.
The manual verification by clinical experts yielded a 95.40% agreement on reasoning chains, substantiating the system's intermediate steps as medically valid and traceable. Such high reliability emphasizes DeepRare's potential as a trustworthy decision support tool in rare disease diagnostics.
Diagnostic Accuracy Across Specialties
The system demonstrated substantial performance across diverse medical specialties, asserting its broad understanding of medical knowledge. In the Endocrine System category, it achieved a top-1 diagnostic accuracy of 60%, notably higher than competing methods. DeepRare also excelled in the Kidneys and Urinary System category with an accuracy of 66%, highlighting its clinical application prowess.
Web Application Deployment
The DeepRare system has been deployed as a user-friendly web application to facilitate clinical adoption. It allows users to input patient demographics, clinical presentations, and family histories to obtain diagnostic predictions. The platform supports the upload of supplementary materials such as case reports and diagnostic imaging, promoting comprehensive patient assessments.
Implications for Future Developments
DeepRare addresses critical challenges such as the dynamic nature of rare disease knowledge, the scarcity of data, and the necessity for transparency and traceability in clinical diagnostics. By offering evidence-based reasoning chains, the system reduces the time for literature review, thereby enhancing diagnostic efficiency in healthcare settings.
Future avenues may include refining retrieval mechanisms for more precise knowledge curation and expanding the agentic system to encompass rare disease treatment and prognosis prediction. This could transform DeepRare into an even more versatile ecosystem for rare disease management.
Conclusion
DeepRare provides an integrated framework for rare disease diagnosis with substantial improvements over existing methods, emphasizing both diagnostic accuracy and reasoning transparency. Its implementation as a web application and validation across multiple datasets underscore its practical applicability and promise in transforming rare disease diagnostics through computational intelligence.