Formal Modular RAG Architecture

Updated 9 February 2026

Formal Modular RAG Architecture is a systematic and composable framework that decomposes retrieval-augmented generation into modular components with defined input/output specifications.
The architecture enables rigorous benchmarking and ablation through precise module interfaces and parameterizable components, supporting extensive analysis and design-space exploration.
It promotes extensibility and scalability by independently instantiating retrieval (SE, PF, PR) modules, allowing tailored instantiations for both accuracy and computational efficiency.

A formal modular Retrieval-Augmented Generation (RAG) architecture refers to a systematic, composable, and interface-driven decomposition of a RAG system, such that its core functionalities—retrieval, context aggregation, and generative reasoning—are realized as interoperable modules with well-defined input/output types and documented interface specifications, enabling rigorous benchmarking, ablation, extensibility, and tailored instantiation for diverse application scenarios and knowledge sources. In the context of Graph-based Retrieval-Augmented Generation (GraphRAG), a formal modular architecture encompasses fine-grained module boundaries, precise pipeline composition, and parameterizable components, supporting both analysis and design-space exploration in large-scale reasoning tasks (Cao et al., 2024).

1. Formal Problem Statement and Pipeline Structure

A modular GraphRAG framework assumes as input a text-attributed knowledge graph $G = (V, E)$ , where $V$ is a set of entities with textual descriptions and $E \subseteq V \times R \times V$ a set of labeled, directed edges for relations $R$ . Given a natural-language query $q \in \Sigma^*$ , a preprocessing frontend extracts a set of query entities/relations $\varepsilon_q = \{(v_i^{(q)}, e_j^{(q)})\}$ present in $G$ .

The pipeline supports the following formal sequence:

Entity/Relation Extraction: $\varepsilon_q = \mathrm{Extract}(q) \subset V \times R$
Reasoning Chain Retrieval: $R = \{P_i\} = \mathrm{Retrieve}(G, \varepsilon_q)$ , where each $P$ is a multi-hop path.
Augmented Prompt Construction: $V$ 0
Answer Generation: $V$ 1

Compactly, $V$ 2.

2. Modular Decomposition of Retrieval

The retrieval phase is decomposed into three sequential modules, each with strictly defined interface contracts:

2.1 Subgraph-Extraction (SE)

Inputs: Full graph $V$ 3, query entity set $V$ 4, parameters $V$ 5, $V$ 6.
Outputs: Query-specific subgraph $V$ 7, $V$ 8.
Algorithm: Personalized PageRank (PPR) from seed nodes, with possible semantic reranking via $V$ 9 if coupled with a neural/LLM scorer.

Interface:

$G$ 2

2.2 Path-Filtering (PF)

Inputs: Subgraph $E \subseteq V \times R \times V$ 0, seeds $E \subseteq V \times R \times V$ 1, method $E \subseteq V \times R \times V$ 2 {SPF, CPF, IPF}, $E \subseteq V \times R \times V$ 3, scoring function $E \subseteq V \times R \times V$ 4.
Outputs: Candidate paths $E \subseteq V \times R \times V$ 5.
Algorithms: Shortest-Path (Dijkstra, SPF), Complete Path Filtering (CPF, BFS enumeration), Iterative/Beam Search (IPF).

Interface:

$G$ 3

Inputs: Candidates $E \subseteq V \times R \times V$ 6, query $E \subseteq V \times R \times V$ 7, scoring function $E \subseteq V \times R \times V$ 8, $E \subseteq V \times R \times V$ 9.
Outputs: Refined paths $R$ 0 (top- $R$ 1).
Algorithm: Score and select top- $R$ 2 candidates.

Interface:

$G$ 4

Module Interfaces Summary

Module	Input Types	Output Types	Core Algorithm
SE	$R$ 3	$R$ 4	PPR, semantic rerank
PF	$R$ 5	$R$ 6	SPF, CPF, IPF (beam)
PR	$R$ 7	$R$ 8 (top $R$ 9 paths)	Scoring + selection

3. Systematic Taxonomy of Existing Techniques

Existing GraphRAG techniques can be mapped as valid module choices:

SE: Purely structural (PPR, RWR), lexical (BM25), neural (Sentence-Transformer, DPR), LLM rerank (Llama/GPT), fine-tuned KG-coupled models.
PF: Standard SPF/CPF (as in classical KBQA), beam-search + BM25/NN-based scorer, LLM-based scoring, fine-tuned in-domain LLMs.
PR: Random, BM25 of path text, Sentence-Transformer rerankers, LLM re-ranking, LoRA-fine-tuned discriminators.

This mapping reveals a design space where each module is independently instantiable, provided interface consistency.

4. Assembly and Instantiation of New GraphRAG Pipelines

A concrete GraphRAG instance is specified by selecting one method per module, with budgetary/compatibility constraints:

SE Module: { $q \in \Sigma^*$ 0, $q \in \Sigma^*$ 1, $q \in \Sigma^*$ 2, $q \in \Sigma^*$ 3, $q \in \Sigma^*$ 4}
PF Module: { $q \in \Sigma^*$ 5, $q \in \Sigma^*$ 6, $q \in \Sigma^*$ 7, $q \in \Sigma^*$ 8, $q \in \Sigma^*$ 9}
PR Module: { $\varepsilon_q = \{(v_i^{(q)}, e_j^{(q)})\}$ 0, $\varepsilon_q = \{(v_i^{(q)}, e_j^{(q)})\}$ 1, $\varepsilon_q = \{(v_i^{(q)}, e_j^{(q)})\}$ 2, $\varepsilon_q = \{(v_i^{(q)}, e_j^{(q)})\}$ 3}

Parameters must be tuned to keep subgraph and candidate set sizes within hardware constraints.

Guidelines:

Avoid double fine-tuning across SE and PF to prevent overspecialization.
Non-NN pipelines (structural SE + basic PF/PR) are computationally frugal; LLM-powered pipelines offer accuracy at higher cost.

5. Evaluation Metrics and Multi-Objective Tradeoffs

Key evaluation metrics for modular GraphRAG architectures:

Reasoning Quality $\varepsilon_q = \{(v_i^{(q)}, e_j^{(q)})\}$ 4: F1 or exact-match (HR@1) on generation output.
Retrieval Quality $\varepsilon_q = \{(v_i^{(q)}, e_j^{(q)})\}$ 5: F1 versus ground-truth reasoning chains.
End-to-End Quality $\varepsilon_q = \{(v_i^{(q)}, e_j^{(q)})\}$ 6.
Runtime $\varepsilon_q = \{(v_i^{(q)}, e_j^{(q)})\}$ 7: $\varepsilon_q = \{(v_i^{(q)}, e_j^{(q)})\}$ 8 (seconds).
Token Cost $\varepsilon_q = \{(v_i^{(q)}, e_j^{(q)})\}$ 9: aggregate LLM tokens processed.
GPU Cost $G$ 0: LLM latency × GPU power.

Comprehensive optimization:

$G$ 1

6. Empirical Design Principles for Modular GraphRAG

Empirical analysis in LEGO-GraphRAG provides the following guidance:

SE: PPR maximizes recall; adding Sentence-Transformer reranking improves precision at low cost. Vanilla LLM reranking is more effective but incurs ~5× runtime overhead.
PF: SPF and CPF are efficient; CPF provides richer context but is noisier. Beam search with ST reranking is optimal for F1/runtime; fine-tuning offers marginal gain. LLM beam search only helps with large, well-prompted models.
PR: BM25 is fast but low quality; ST rerankers add 3–5 F1 points; LLM re-ranking is best but doubles runtime.
Prompt Engineering: Increasing path count up to ~16 boosts F1, after which returns diminish. Few-shot prompts show inconsistent effects; zero-shot is robust.
Overall Pipelines: PPR→SPF→ST for throughput; PPR+LLM_ft→BS+ST→LLM for accuracy (Cao et al., 2024).

The modular decomposition, taxonomy, and instantiation protocol in LEGO-GraphRAG enable systematic design, reproducibility, and principled experimentation in building advanced RAG systems grounded in structured knowledge graphs.

Markdown Report Issue Upgrade to Chat

References (1)

LEGO-GraphRAG: Modularizing Graph-based Retrieval-Augmented Generation for Design Space Exploration (2024)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Formal Modular RAG Architecture.

Formal Modular RAG Architecture

1. Formal Problem Statement and Pipeline Structure

2. Modular Decomposition of Retrieval

2.1 Subgraph-Extraction (SE)

2.2 Path-Filtering (PF)

2.3 Path-Refinement (PR)

Module Interfaces Summary

3. Systematic Taxonomy of Existing Techniques

4. Assembly and Instantiation of New GraphRAG Pipelines

5. Evaluation Metrics and Multi-Objective Tradeoffs

6. Empirical Design Principles for Modular GraphRAG

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Formal Modular RAG Architecture

1. Formal Problem Statement and Pipeline Structure

2. Modular Decomposition of Retrieval

2.1 Subgraph-Extraction (SE)

2.2 Path-Filtering (PF)

2.3 Path-Refinement (PR)

Module Interfaces Summary

3. Systematic Taxonomy of Existing Techniques

4. Assembly and Instantiation of New GraphRAG Pipelines

5. Evaluation Metrics and Multi-Objective Tradeoffs

6. Empirical Design Principles for Modular GraphRAG

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Related Topics