Papers
Topics
Authors
Recent
Search
2000 character limit reached

Entity-Guided Graph Traversal for KB-QA

Updated 10 November 2025
  • Entity-Guided Graph Traversal is a KB-QA method that uses detected entities to extract local subgraphs and map natural language queries into graph traversal operations.
  • It integrates entity linking, subgraph extraction, and joint semantic mapping with predicate similarity and type constraints to effectively resolve disambiguation.
  • The approach demonstrates improved recall and F1 on non-aggregation questions, streamlining query translation in large graph-structured knowledge bases like DBpedia.

Entity-Guided Graph Traversal is a knowledge base question answering (KB-QA) methodology that leverages local graph substructures, anchored on detected entities, to enable semantic parsing and answer retrieval from large graph-structured resources such as DBpedia. This approach is specifically suited for non-aggregation questions, focusing on mapping natural language questions into subgraph traversal operations that yield accurate answers while jointly resolving semantic mapping and disambiguation.

1. Knowledge Base and Formal Problem Definition

Let the underlying knowledge base (KB), such as DBpedia, be modeled as a directed, labeled graph G=(V,E)G = (V, E), where VV denotes the set of RDF nodes (resources, classes, literals), and EV×P×VE \subseteq V \times P \times V is the set of labeled edges (each (u,p,v)E(u,p,v) \in E encodes a relation u p vu\ \xrightarrow{p}\ v). Given a user query qq in natural language, the system must return the accurate set of answers by traversing appropriate paths in GG.

The entity-guided method proceeds by:

  • Detecting a subset of KB entities Eq={e1,e2,,ek}VE_q = \{e_1, e_2, \ldots, e_k\} \subseteq V referenced in qq, each corresponding to a DBpedia URI.
  • Extracting a “topological structure” or pattern TqT_q from qq, represented as a tree with edges labeled by surface phrases (e.g., “mayor of”).
  • Defining KK as the maximal branch length (in edges) in TqT_q, restricting traversal depth.

2. Entity Detection and Local Subgraph Construction

2.1 Entity Linking

The process employs an external entity linker (specifically, Wikipedia Miner), which:

  • Identifies mention–resource pairs (m,e)(m, e) in qq.
  • Retains only those with linker score θ\geq \theta (θ=0.15\theta = 0.15).
  • Excludes schema-level (e.g. dbo:Actor) or category entities, retaining only instance-level entities for EqE_q.

2.2 Subgraph Extraction

With EqE_q established, a local subgraph Gs=(Vs,Es)G_s = (V_s, E_s) is constructed as follows:

  1. Initialize VsEqV_s \leftarrow E_q, EsE_s \leftarrow \emptyset.
  2. For every eEqe \in E_q and for d=1Kd = 1 \ldots K:
    • Perform breadth-first expansion to depth dd.
    • Collect all edges (e,p,e)(e', p, e'') with eVse' \in V_s or eVse'' \in V_s.
    • Augment VsV_s and EsE_s with endpoints and edges.
  3. Expand until reaching depth KK (the longest path indicated by TqT_q). This design guarantees that any answer path conforming to TqT_q lies entirely in GsG_s.

3. Joint Semantic Item Mapping and Disambiguation

Entity-guided graph traversal unifies two typically disjoint subtasks: (a) Semantic-Item-Mapping: Extract the structured template TqT_q, e.g., discerning the pattern "Who is the mayor of Berlin?” as ANSNODE—“mayor of”—Berlin, where ANSNODE is a variable for the answer; (b) Semantic-Item-Disambiguation: Select, within GsG_s, the precise sequence of predicates and nodes realizing the intended query semantics.

The methodology:

  • Adopts light-weight constituency-based patterns to extract TqT_q from qq (eschewing template induction).
  • Searches GsG_s for candidate paths matching the topology of TqT_q.
  • Scores each candidate path based on the semantic match between KB predicate labels and question phrases, and enforces answer type constraints.

For a pattern TqT_q with mm edges, each labeled tit_i (the ii-th surface phrase), candidate paths p=(e0p1e1p2pmem)p = (e_0 \xrightarrow{p_1} e_1 \xrightarrow{p_2} \ldots \xrightarrow{p_m} e_m) in GsG_s are scored:

Score(p)=1mi=1msim(pi,ti)+TypeSim(em,focus)\mathrm{Score}(p) = \frac{1}{m} \sum_{i=1}^m \mathrm{sim}(p_i, t_i) + \mathrm{TypeSim}(e_m, \mathrm{focus})

where

  • sim(pi,ti)\mathrm{sim}(p_i, t_i) is the semantic similarity between predicate label pip_i and surface phrase tit_i;
  • TypeSim(em,focus)\mathrm{TypeSim}(e_m, \mathrm{focus}) assesses the compatibility of terminal node eme_m with the expected answer type inferred from question focus (e.g. "person", "place").

This joint objective disambiguates both predicate path selection and answer candidate typing.

4. Path Traversal Algorithm and Pruning

The core traversal algorithm, denoted "FindCandidatePaths," operates as follows for pattern length mm and phrase set (t1,,tm)(t_1,\ldots,t_m):

  1. Initialize P0={(e,[],0):eEq}P_0 = \{(e, [], 0) : e \in E_q \}
  2. For each step i=1mi=1 \ldots m:
    • For each (ei1,(e_{i-1}, path, score_so_far)Pi1) \in P_{i-1}:
      • Enumerate top-kk outgoing edges from ei1e_{i-1} by sim(pe,ti)\mathrm{sim}(p_e, t_i).
      • Prune edges with similarity <τ< \tau.
      • For each permissible edge, create new partial path in PiP_i.
  3. After mm steps, PmP_m contains valid, length-mm candidate paths.
  4. Each eme_m is scored with TypeSim(em)\mathrm{TypeSim}(e_m), yielding final path score.
  5. Return candidates ranked by total score.

The state per traversal step is a triple: (current node eie_i, accumulated predicates, similarity score). Branching is curtailed by top-kk predicate selection and threshold τ\tau. In the worst case, the number of paths is O(Eqbm)O(|E_q| \cdot b^m) (for average branching factor bb), but effective pruning ensures tractable computational cost.

5. Path Scoring and Answer Selection

5.1 Predicate-Similarity Features

For each path edge pip_i and corresponding tit_i, the system computes: PredicateScorei=max label of pi(1wmaxtwtiUMBCsim(w,tw))\text{PredicateScore}_i = \max_{\ell \text{ label of } p_i} \left( \frac{1}{|\ell|} \sum_{w \in \ell} \max_{tw \in t_i} \mathrm{UMBCsim}(w, tw) \right) where UMBCsim\mathrm{UMBCsim} is a word similarity service.

5.2 Type-Constraint Feature

A focus phrase FF is extracted from qq, typically the head noun following interrogatives. For each answer candidate eme_m, the best semantic match is computed: TypeScore=maxtyptypes(em)UMBCsim(typ label,head(F))\text{TypeScore} = \max_{\mathrm{typ} \in \text{types}(e_m)} \mathrm{UMBCsim}(\text{typ label}, \mathrm{head}(F))

5.3 Combined Scoring

PathScore=1mi=1msi+ts\text{PathScore} = \frac{1}{m} \sum_{i=1}^m s_i + t_s

with {s1,,sm}\{s_1,\ldots,s_m\} the predicate scores and tst_s the type score. Candidates are returned ranked by this measure.

6. Query Translation and System Output

While the implementation yields URI or literal answers directly, any discovered path p=(e0,p1,e1,,pm,em)p = (e_0, p_1, e_1, \ldots, p_m, e_m) can be rendered as a SPARQL query:

1
2
3
4
5
6
7
8
SELECT ?ans WHERE {
  <e_0> <p_1> ?x_1 .
  ?x_1 <p_2> ?x_2 .
  ...
  ?x_{m-1} <p_m> ?ans .
  OPTIONAL { ?ans rdf:type ?t .
             FILTER(regex(str(?t), "<head(F)>", "i")) }
}
The OPTIONAL-FILTER clause encodes the type constraint derived from the focus phrase.

7. Experimental Evaluation and Comparative Performance

Experiments were conducted on QALD-3 benchmarks:

  • The full test set contains 99 natural-language questions; QALD-3-NA is a non-aggregation subset (61 questions, excluding COUNT/ORDER BY/FILTER types).
  • Metrics: Precision =AG/A= |\mathcal{A} \cap \mathcal{G}| / |\mathcal{A}|, Recall =AG/G= |\mathcal{A} \cap \mathcal{G}| / |\mathcal{G}|, F1 (2PrecRec/(Prec+Rec)2\cdot\text{Prec}\cdot\text{Rec}/(\text{Prec}+\text{Rec})), reported as averages.

Summary of results:

Dataset System Processed Correct Partial Avg-Recall Avg-Prec Avg-F1
QALD-3-NA Ours 53 30 13 0.67 0.61 0.61
gAnswer demo 38 21 7 0.41 0.45 0.42
QALD-3-full Ours 60 31 17 0.46 0.40 0.40
gAnswer 76 32 11 0.40 0.40 0.40
DEANNA 27 21 0 0.21 0.21 0.21

Entity-guided graph traversal achieves higher recall and F1 on non-aggregation questions compared to several state-of-the-art systems and leads on recall in the full QALD-3 evaluation set. This suggests its effectiveness in answer retrieval where explicit aggregation is not required and where local subgraph patterns, seeded by confidently linked entities, are salient.

8. Significance, Limitations, and Applicability

Entity-guided graph traversal simplifies the process of mapping questions to queries by:

  • Avoiding heavy template induction (and thus manual engineering).
  • Focusing on answer path ranking rather than exhaustive global search.
  • Enabling joint semantic matching and type-based answer disambiguation within manageable subgraphs.

A plausible implication is that the method is less suited to questions which require aggregation, counting, or global reasoning over the KB. Its computational efficiency depends critically on the effectiveness of entity linking, the restrictiveness of local subgraph expansion, and the suitability of the underlying similarity metrics. The approach is well-aligned with KBs that exhibit rich instance-level connectivity and clearly typed relations, such as DBpedia.

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Entity-Guided Graph Traversal.