Entity-Guided Graph Traversal for KB-QA
- Entity-Guided Graph Traversal is a KB-QA method that uses detected entities to extract local subgraphs and map natural language queries into graph traversal operations.
- It integrates entity linking, subgraph extraction, and joint semantic mapping with predicate similarity and type constraints to effectively resolve disambiguation.
- The approach demonstrates improved recall and F1 on non-aggregation questions, streamlining query translation in large graph-structured knowledge bases like DBpedia.
Entity-Guided Graph Traversal is a knowledge base question answering (KB-QA) methodology that leverages local graph substructures, anchored on detected entities, to enable semantic parsing and answer retrieval from large graph-structured resources such as DBpedia. This approach is specifically suited for non-aggregation questions, focusing on mapping natural language questions into subgraph traversal operations that yield accurate answers while jointly resolving semantic mapping and disambiguation.
1. Knowledge Base and Formal Problem Definition
Let the underlying knowledge base (KB), such as DBpedia, be modeled as a directed, labeled graph , where denotes the set of RDF nodes (resources, classes, literals), and is the set of labeled edges (each encodes a relation ). Given a user query in natural language, the system must return the accurate set of answers by traversing appropriate paths in .
The entity-guided method proceeds by:
- Detecting a subset of KB entities referenced in , each corresponding to a DBpedia URI.
- Extracting a “topological structure” or pattern from , represented as a tree with edges labeled by surface phrases (e.g., “mayor of”).
- Defining as the maximal branch length (in edges) in , restricting traversal depth.
2. Entity Detection and Local Subgraph Construction
2.1 Entity Linking
The process employs an external entity linker (specifically, Wikipedia Miner), which:
- Identifies mention–resource pairs in .
- Retains only those with linker score ().
- Excludes schema-level (e.g.
dbo:Actor) or category entities, retaining only instance-level entities for .
2.2 Subgraph Extraction
With established, a local subgraph is constructed as follows:
- Initialize , .
- For every and for :
- Perform breadth-first expansion to depth .
- Collect all edges with or .
- Augment and with endpoints and edges.
- Expand until reaching depth (the longest path indicated by ). This design guarantees that any answer path conforming to lies entirely in .
3. Joint Semantic Item Mapping and Disambiguation
Entity-guided graph traversal unifies two typically disjoint subtasks: (a) Semantic-Item-Mapping: Extract the structured template , e.g., discerning the pattern "Who is the mayor of Berlin?” as ANSNODE—“mayor of”—Berlin, where ANSNODE is a variable for the answer; (b) Semantic-Item-Disambiguation: Select, within , the precise sequence of predicates and nodes realizing the intended query semantics.
The methodology:
- Adopts light-weight constituency-based patterns to extract from (eschewing template induction).
- Searches for candidate paths matching the topology of .
- Scores each candidate path based on the semantic match between KB predicate labels and question phrases, and enforces answer type constraints.
For a pattern with edges, each labeled (the -th surface phrase), candidate paths in are scored:
where
- is the semantic similarity between predicate label and surface phrase ;
- assesses the compatibility of terminal node with the expected answer type inferred from question focus (e.g. "person", "place").
This joint objective disambiguates both predicate path selection and answer candidate typing.
4. Path Traversal Algorithm and Pruning
The core traversal algorithm, denoted "FindCandidatePaths," operates as follows for pattern length and phrase set :
- Initialize
- For each step :
- For each path, score_so_far:
- Enumerate top- outgoing edges from by .
- Prune edges with similarity .
- For each permissible edge, create new partial path in .
- For each path, score_so_far:
- After steps, contains valid, length- candidate paths.
- Each is scored with , yielding final path score.
- Return candidates ranked by total score.
The state per traversal step is a triple: (current node , accumulated predicates, similarity score). Branching is curtailed by top- predicate selection and threshold . In the worst case, the number of paths is (for average branching factor ), but effective pruning ensures tractable computational cost.
5. Path Scoring and Answer Selection
5.1 Predicate-Similarity Features
For each path edge and corresponding , the system computes: where is a word similarity service.
5.2 Type-Constraint Feature
A focus phrase is extracted from , typically the head noun following interrogatives. For each answer candidate , the best semantic match is computed:
5.3 Combined Scoring
with the predicate scores and the type score. Candidates are returned ranked by this measure.
6. Query Translation and System Output
While the implementation yields URI or literal answers directly, any discovered path can be rendered as a SPARQL query:
1 2 3 4 5 6 7 8 |
SELECT ?ans WHERE {
<e_0> <p_1> ?x_1 .
?x_1 <p_2> ?x_2 .
...
?x_{m-1} <p_m> ?ans .
OPTIONAL { ?ans rdf:type ?t .
FILTER(regex(str(?t), "<head(F)>", "i")) }
} |
7. Experimental Evaluation and Comparative Performance
Experiments were conducted on QALD-3 benchmarks:
- The full test set contains 99 natural-language questions; QALD-3-NA is a non-aggregation subset (61 questions, excluding COUNT/ORDER BY/FILTER types).
- Metrics: Precision , Recall , F1 (), reported as averages.
Summary of results:
| Dataset | System | Processed | Correct | Partial | Avg-Recall | Avg-Prec | Avg-F1 |
|---|---|---|---|---|---|---|---|
| QALD-3-NA | Ours | 53 | 30 | 13 | 0.67 | 0.61 | 0.61 |
| gAnswer demo | 38 | 21 | 7 | 0.41 | 0.45 | 0.42 | |
| QALD-3-full | Ours | 60 | 31 | 17 | 0.46 | 0.40 | 0.40 |
| gAnswer | 76 | 32 | 11 | 0.40 | 0.40 | 0.40 | |
| DEANNA | 27 | 21 | 0 | 0.21 | 0.21 | 0.21 |
Entity-guided graph traversal achieves higher recall and F1 on non-aggregation questions compared to several state-of-the-art systems and leads on recall in the full QALD-3 evaluation set. This suggests its effectiveness in answer retrieval where explicit aggregation is not required and where local subgraph patterns, seeded by confidently linked entities, are salient.
8. Significance, Limitations, and Applicability
Entity-guided graph traversal simplifies the process of mapping questions to queries by:
- Avoiding heavy template induction (and thus manual engineering).
- Focusing on answer path ranking rather than exhaustive global search.
- Enabling joint semantic matching and type-based answer disambiguation within manageable subgraphs.
A plausible implication is that the method is less suited to questions which require aggregation, counting, or global reasoning over the KB. Its computational efficiency depends critically on the effectiveness of entity linking, the restrictiveness of local subgraph expansion, and the suitability of the underlying similarity metrics. The approach is well-aligned with KBs that exhibit rich instance-level connectivity and clearly typed relations, such as DBpedia.