- The paper introduces HiRMed, a hierarchical RAG-enhanced system that mimics clinical decision-making to refine medical test recommendations.
- It leverages dual-layer knowledge bases and advanced embedding models to dynamically enhance diagnostic precision and reduce miss rates.
- Experimental results demonstrate significant improvements, achieving 92.3% coverage and 88.7% accuracy compared to traditional methods.
A Tree-based RAG-Agent Recommendation System: A Case Study in Medical Test Data
Introduction
The paper introduces HiRMed (Hierarchical RAG-enhanced Medical Test Recommendation), a novel approach to medical test recommendation designed to address shortcomings in traditional systems such as rule-based approaches or similarity-based retrieval methods. These conventional methods often fail to capture the complexity and nuanced decision-making required for accurate medical diagnostics. By incorporating a hierarchical tree structure with Retrieval-Augmented Generation (RAG), HiRMed aims to simulate the logical steps taken during clinical decision-making processes. Each node within this framework dynamically refines recommendations based on domain-specific medical knowledge, urgency levels, and diagnostic uncertainty, overcoming the limitations associated with static models.
HiRMed System Architecture

Figure 1: HiRMed System Architecture illustrating the hierarchical model design.
Methodology
Dataset and Knowledge Base Construction
HiRMed utilizes a comprehensive dataset of outpatient visits along with a robust dual-layer knowledge base architecture. This knowledge base is critical for hierarchical reasoning, consisting of:
- Department Level: Broad medical knowledge that encompasses general clinical guidelines.
- Test-Specific Level: Fine-tuned insights concerning specific diagnostic tests and their parameters.
These datasets are embedded using advanced LLMs like OpenAI's text-embedding-ada-002, enabling efficient vector-based retrieval from a FAISS vector database. Such a structure supports dynamic and context-aware medical recommendations by leveraging both general and specialized medical insights.
Model Architecture
HiRMed's multi-layer architecture enables progressive enhancement of diagnostic recommendations, consisting of three main components:
- Embedding Model: Converts patient queries and knowledge base texts into semantic vectors, facilitating content retrieval.
- LLM API (GPT-O1): Engages in complex reasoning and hypothesis generation based on retrieved data.
- Weight Model (Fine-tuned LLaMA3.2-3B): Prioritizes recommendations by incorporating historical relevance scores, considering patient demographics and symptom severity.
The architecture's hierarchical nature permits transition from department-wide considerations to specific test recommendations, thereby aligning closely with real-world medical logic.
Experimental Results
The effectiveness of HiRMed as a medical recommendation system is evident from its empirical evaluation, showcasing substantial gains over baseline methods such as Flat-RAG and Traditional Vector Similarity (TVS). Key metrics include:
- Coverage Rate: HiRMed achieves a coverage rate of 92.3%, outperforming Flat-RAG (84.7%) and TVS (72.8%).
- Accuracy: It ensures higher precision in recommended tests (88.7%) versus Flat-RAG (82.4%) and TVS (71.5%).
- Miss Rate: Demonstrates significantly lower miss rates (2.1%) compared to Flat-RAG (5.8%) and TVS (10.6%).
These metrics underscore the system's superior retrieval capabilities and its ability to make nuanced recommendations, thus reducing erroneous or overlooked diagnostic tests.
HiRMed displays strong performance across various medical departments, notably in cardiology, endocrinology, and gastroenterology. The detailed statistical analysis indicates consistent high-level accuracy and coverage rates, affirming the adaptability of the hierarchical model to diverse medical specialties.
Component Analysis
The robustness of HiRMed's architecture is further demonstrated through comprehensive ablation studies, which highlight the critical role of hierarchical structures and memory augmentation. Removing layers or important components detrimentally affects system performance, emphasizing the necessity of structured reasoning and layered knowledge integration for maintaining high accuracy and comprehensive coverage.
Conclusion
HiRMed presents a significant advancement in medical test recommendation systems by adeptly combining tree-structured hierarchical reasoning with RAG-enhanced logic. This architecture effectively bridges the gap between static models and dynamic, context-aware systems, achieving impressive performance metrics in accuracy and diagnostic utility. The results favorably position HiRMed as a blueprint for future systems requiring advanced reasoning capabilities in healthcare and beyond. Future research is encouraged to explore real-time feedback integration and adaptation to complex patient scenarios, both of which could significantly enhance the utility of hierarchical reasoning in clinical practice.