Formal Modular RAG Architecture
- Formal Modular RAG Architecture is a systematic and composable framework that decomposes retrieval-augmented generation into modular components with defined input/output specifications.
- The architecture enables rigorous benchmarking and ablation through precise module interfaces and parameterizable components, supporting extensive analysis and design-space exploration.
- It promotes extensibility and scalability by independently instantiating retrieval (SE, PF, PR) modules, allowing tailored instantiations for both accuracy and computational efficiency.
A formal modular Retrieval-Augmented Generation (RAG) architecture refers to a systematic, composable, and interface-driven decomposition of a RAG system, such that its core functionalities—retrieval, context aggregation, and generative reasoning—are realized as interoperable modules with well-defined input/output types and documented interface specifications, enabling rigorous benchmarking, ablation, extensibility, and tailored instantiation for diverse application scenarios and knowledge sources. In the context of Graph-based Retrieval-Augmented Generation (GraphRAG), a formal modular architecture encompasses fine-grained module boundaries, precise pipeline composition, and parameterizable components, supporting both analysis and design-space exploration in large-scale reasoning tasks (Cao et al., 2024).
1. Formal Problem Statement and Pipeline Structure
A modular GraphRAG framework assumes as input a text-attributed knowledge graph , where is a set of entities with textual descriptions and a set of labeled, directed edges for relations . Given a natural-language query , a preprocessing frontend extracts a set of query entities/relations present in .
The pipeline supports the following formal sequence:
- Entity/Relation Extraction:
- Reasoning Chain Retrieval: , where each is a multi-hop path.
- Augmented Prompt Construction:
- Answer Generation:
Compactly, .
2. Modular Decomposition of Retrieval
The retrieval phase is decomposed into three sequential modules, each with strictly defined interface contracts:
2.1 Subgraph-Extraction (SE)
- Inputs: Full graph , query entity set , parameters , .
- Outputs: Query-specific subgraph , .
- Algorithm: Personalized PageRank (PPR) from seed nodes, with possible semantic reranking via if coupled with a neural/LLM scorer.
Interface:
1 |
def SE_PPR(G, seeds, λ, max_ent): ... |
2.2 Path-Filtering (PF)
- Inputs: Subgraph , seeds , method {SPF, CPF, IPF}, , scoring function .
- Outputs: Candidate paths .
- Algorithms: Shortest-Path (Dijkstra, SPF), Complete Path Filtering (CPF, BFS enumeration), Iterative/Beam Search (IPF).
Interface:
1 |
def IPF(g_q, seeds, beam_width, S_path): ... |
2.3 Path-Refinement (PR)
- Inputs: Candidates , query , scoring function , .
- Outputs: Refined paths (top-).
- Algorithm: Score and select top- candidates.
Interface:
1 2 3 |
score_list = [(P_i, S_ref(P_i, q)) for P_i in R] hat_R = top_k(P_i) by score return hat_R |
Module Interfaces Summary
| Module | Input Types | Output Types | Core Algorithm |
|---|---|---|---|
| SE | PPR, semantic rerank | ||
| PF | SPF, CPF, IPF (beam) | ||
| PR | (top paths) | Scoring + selection |
3. Systematic Taxonomy of Existing Techniques
Existing GraphRAG techniques can be mapped as valid module choices:
- SE: Purely structural (PPR, RWR), lexical (BM25), neural (Sentence-Transformer, DPR), LLM rerank (Llama/GPT), fine-tuned KG-coupled models.
- PF: Standard SPF/CPF (as in classical KBQA), beam-search + BM25/NN-based scorer, LLM-based scoring, fine-tuned in-domain LLMs.
- PR: Random, BM25 of path text, Sentence-Transformer rerankers, LLM re-ranking, LoRA-fine-tuned discriminators.
This mapping reveals a design space where each module is independently instantiable, provided interface consistency.
4. Assembly and Instantiation of New GraphRAG Pipelines
A concrete GraphRAG instance is specified by selecting one method per module, with budgetary/compatibility constraints:
- SE Module: {, , , , }
- PF Module: {, , , , }
- PR Module: {, , , }
Parameters must be tuned to keep subgraph and candidate set sizes within hardware constraints.
Guidelines:
- Avoid double fine-tuning across SE and PF to prevent overspecialization.
- Non-NN pipelines (structural SE + basic PF/PR) are computationally frugal; LLM-powered pipelines offer accuracy at higher cost.
5. Evaluation Metrics and Multi-Objective Tradeoffs
Key evaluation metrics for modular GraphRAG architectures:
- Reasoning Quality : F1 or exact-match (HR@1) on generation output.
- Retrieval Quality : F1 versus ground-truth reasoning chains.
- End-to-End Quality .
- Runtime : (seconds).
- Token Cost : aggregate LLM tokens processed.
- GPU Cost : LLM latency × GPU power.
Comprehensive optimization:
6. Empirical Design Principles for Modular GraphRAG
Empirical analysis in LEGO-GraphRAG provides the following guidance:
- SE: PPR maximizes recall; adding Sentence-Transformer reranking improves precision at low cost. Vanilla LLM reranking is more effective but incurs ~5× runtime overhead.
- PF: SPF and CPF are efficient; CPF provides richer context but is noisier. Beam search with ST reranking is optimal for F1/runtime; fine-tuning offers marginal gain. LLM beam search only helps with large, well-prompted models.
- PR: BM25 is fast but low quality; ST rerankers add 3–5 F1 points; LLM re-ranking is best but doubles runtime.
- Prompt Engineering: Increasing path count up to ~16 boosts F1, after which returns diminish. Few-shot prompts show inconsistent effects; zero-shot is robust.
- Overall Pipelines: PPR→SPF→ST for throughput; PPR+LLM_ft→BS+ST→LLM for accuracy (Cao et al., 2024).
The modular decomposition, taxonomy, and instantiation protocol in LEGO-GraphRAG enable systematic design, reproducibility, and principled experimentation in building advanced RAG systems grounded in structured knowledge graphs.