Papers
Topics
Authors
Recent
Search
2000 character limit reached

Formal Modular RAG Architecture

Updated 9 February 2026
  • Formal Modular RAG Architecture is a systematic and composable framework that decomposes retrieval-augmented generation into modular components with defined input/output specifications.
  • The architecture enables rigorous benchmarking and ablation through precise module interfaces and parameterizable components, supporting extensive analysis and design-space exploration.
  • It promotes extensibility and scalability by independently instantiating retrieval (SE, PF, PR) modules, allowing tailored instantiations for both accuracy and computational efficiency.

A formal modular Retrieval-Augmented Generation (RAG) architecture refers to a systematic, composable, and interface-driven decomposition of a RAG system, such that its core functionalities—retrieval, context aggregation, and generative reasoning—are realized as interoperable modules with well-defined input/output types and documented interface specifications, enabling rigorous benchmarking, ablation, extensibility, and tailored instantiation for diverse application scenarios and knowledge sources. In the context of Graph-based Retrieval-Augmented Generation (GraphRAG), a formal modular architecture encompasses fine-grained module boundaries, precise pipeline composition, and parameterizable components, supporting both analysis and design-space exploration in large-scale reasoning tasks (Cao et al., 2024).

1. Formal Problem Statement and Pipeline Structure

A modular GraphRAG framework assumes as input a text-attributed knowledge graph G=(V,E)G = (V, E), where VV is a set of entities with textual descriptions and EV×R×VE \subseteq V \times R \times V a set of labeled, directed edges for relations RR. Given a natural-language query qΣq \in \Sigma^*, a preprocessing frontend extracts a set of query entities/relations εq={(vi(q),ej(q))}\varepsilon_q = \{(v_i^{(q)}, e_j^{(q)})\} present in GG.

The pipeline supports the following formal sequence:

  1. Entity/Relation Extraction: εq=Extract(q)V×R\varepsilon_q = \mathrm{Extract}(q) \subset V \times R
  2. Reasoning Chain Retrieval: R={Pi}=Retrieve(G,εq)R = \{P_i\} = \mathrm{Retrieve}(G, \varepsilon_q), where each PP is a multi-hop path.
  3. Augmented Prompt Construction: q=qFormat(R)q' = q \cup \mathrm{Format}(R)
  4. Answer Generation: a=LLM(q)a = \mathrm{LLM}(q')

Compactly, a=Generate(AugPrompt(q,Retrieve(G,Extract(q))))a = \mathrm{Generate}(\mathrm{AugPrompt}(q, \mathrm{Retrieve}(G, \mathrm{Extract}(q)))).

2. Modular Decomposition of Retrieval

The retrieval phase is decomposed into three sequential modules, each with strictly defined interface contracts:

2.1 Subgraph-Extraction (SE)

  • Inputs: Full graph GG, query entity set {vi(q)}V\{v_i^{(q)}\} \subset V, parameters max_entN\mathrm{max\_ent} \in \mathbb{N}, coupling_flag{true,false}\mathrm{coupling\_flag} \in \{\mathrm{true}, \mathrm{false}\}.
  • Outputs: Query-specific subgraph gq=(Vq,Eq)g_q = (V_q, E_q), Vqmax_ent|V_q| \leq \mathrm{max\_ent}.
  • Algorithm: Personalized PageRank (PPR) from seed nodes, with possible semantic reranking via S(v;εq)S(v; \varepsilon_q) if coupled with a neural/LLM scorer.

Interface:

1
def SE_PPR(G, seeds, λ, max_ent): ...

2.2 Path-Filtering (PF)

  • Inputs: Subgraph gqg_q, seeds εq\varepsilon_q, method \in {SPF, CPF, IPF}, beam_widthbeam\_width, scoring function Spath(P)S_{\mathrm{path}}(P).
  • Outputs: Candidate paths R={Pi}R = \{P_i\}.
  • Algorithms: Shortest-Path (Dijkstra, SPF), Complete Path Filtering (CPF, BFS enumeration), Iterative/Beam Search (IPF).

Interface:

1
def IPF(g_q, seeds, beam_width, S_path): ...

2.3 Path-Refinement (PR)

  • Inputs: Candidates R={Pi}R = \{P_i\}, query qq, scoring function Sref(Pi,q)S_{\mathrm{ref}}(P_i, q), topktop_k.
  • Outputs: Refined paths R^\hat{R} (top-kk).
  • Algorithm: Score and select top-kk candidates.

Interface:

1
2
3
score_list = [(P_i, S_ref(P_i, q)) for P_i in R]
hat_R = top_k(P_i) by score
return hat_R

Module Interfaces Summary

Module Input Types Output Types Core Algorithm
SE G,εq,...G, \varepsilon_q, ... gq=(Vq,Eq)g_q = (V_q, E_q) PPR, semantic rerank
PF gq,εq,...g_q, \varepsilon_q, ... R={Pi}R = \{P_i\} SPF, CPF, IPF (beam)
PR R,q,...R, q, ... R^\hat{R} (top kk paths) Scoring + selection

3. Systematic Taxonomy of Existing Techniques

Existing GraphRAG techniques can be mapped as valid module choices:

  • SE: Purely structural (PPR, RWR), lexical (BM25), neural (Sentence-Transformer, DPR), LLM rerank (Llama/GPT), fine-tuned KG-coupled models.
  • PF: Standard SPF/CPF (as in classical KBQA), beam-search + BM25/NN-based scorer, LLM-based scoring, fine-tuned in-domain LLMs.
  • PR: Random, BM25 of path text, Sentence-Transformer rerankers, LLM re-ranking, LoRA-fine-tuned discriminators.

This mapping reveals a design space where each module is independently instantiable, provided interface consistency.

4. Assembly and Instantiation of New GraphRAG Pipelines

A concrete GraphRAG instance is specified by selecting one method per module, with budgetary/compatibility constraints:

  1. SE Module: {PPR\mathrm{PPR}, RWR\mathrm{RWR}, PPR+BM25\mathrm{PPR+BM25}, PPR+ST\mathrm{PPR+ST}, PPR+LLMft\mathrm{PPR+LLM_{ft}}}
  2. PF Module: {SPF\mathrm{SPF}, CPF\mathrm{CPF}, BS+BM25\mathrm{BS+BM25}, BS+ST\mathrm{BS+ST}, BS+LLM\mathrm{BS+LLM}}
  3. PR Module: {Random\mathrm{Random}, BM25\mathrm{BM25}, ST\mathrm{ST}, LLM\mathrm{LLM}}

Parameters must be tuned to keep subgraph and candidate set sizes within hardware constraints.

Guidelines:

  • Avoid double fine-tuning across SE and PF to prevent overspecialization.
  • Non-NN pipelines (structural SE + basic PF/PR) are computationally frugal; LLM-powered pipelines offer accuracy at higher cost.

5. Evaluation Metrics and Multi-Objective Tradeoffs

Key evaluation metrics for modular GraphRAG architectures:

  • Reasoning Quality Q(I)Q(I): F1 or exact-match (HR@1) on generation output.
  • Retrieval Quality QR(I)Q_R(I): F1 versus ground-truth reasoning chains.
  • End-to-End Quality QE2E(I)=αQR+(1α)QGenQ_{E2E}(I) = \alpha Q_R + (1-\alpha) Q_{\mathrm{Gen}}.
  • Runtime T(I)T(I): TSE+TPF+TPR+TgenT_{SE} + T_{PF} + T_{PR} + T_{gen} (seconds).
  • Token Cost Ctok(I)C_{\mathrm{tok}}(I): aggregate LLM tokens processed.
  • GPU Cost CGPU(I)C_{\mathrm{GPU}}(I): LLM latency × GPU power.

Comprehensive optimization:

maxF(I)=QE2E(I)λ1T(I)λ2Ctok(I)λ3CGPU(I) subject to T(I)Tmax,  Ctok(I)Cmax.\max F(I) = Q_{E2E}(I) - \lambda_1 T(I) - \lambda_2 C_{\mathrm{tok}}(I) - \lambda_3 C_{\mathrm{GPU}}(I) \ \text{subject to } T(I) \leq T_{max},\; C_{\mathrm{tok}}(I) \leq C_{max}.

6. Empirical Design Principles for Modular GraphRAG

Empirical analysis in LEGO-GraphRAG provides the following guidance:

  • SE: PPR maximizes recall; adding Sentence-Transformer reranking improves precision at low cost. Vanilla LLM reranking is more effective but incurs ~5× runtime overhead.
  • PF: SPF and CPF are efficient; CPF provides richer context but is noisier. Beam search with ST reranking is optimal for F1/runtime; fine-tuning offers marginal gain. LLM beam search only helps with large, well-prompted models.
  • PR: BM25 is fast but low quality; ST rerankers add 3–5 F1 points; LLM re-ranking is best but doubles runtime.
  • Prompt Engineering: Increasing path count up to ~16 boosts F1, after which returns diminish. Few-shot prompts show inconsistent effects; zero-shot is robust.
  • Overall Pipelines: PPR→SPF→ST for throughput; PPR+LLM_ft→BS+ST→LLM for accuracy (Cao et al., 2024).

The modular decomposition, taxonomy, and instantiation protocol in LEGO-GraphRAG enable systematic design, reproducibility, and principled experimentation in building advanced RAG systems grounded in structured knowledge graphs.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Formal Modular RAG Architecture.