LKD-KGC: Domain-Specific KG Construction via LLM-driven Knowledge Dependency Parsing

Published 30 May 2025 in cs.CL and cs.AI | (2505.24163v1)

Abstract: Knowledge Graphs (KGs) structure real-world entities and their relationships into triples, enhancing machine reasoning for various tasks. While domain-specific KGs offer substantial benefits, their manual construction is often inefficient and requires specialized knowledge. Recent approaches for knowledge graph construction (KGC) based on LLMs, such as schema-guided KGC and reference knowledge integration, have proven efficient. However, these methods are constrained by their reliance on manually defined schema, single-document processing, and public-domain references, making them less effective for domain-specific corpora that exhibit complex knowledge dependencies and specificity, as well as limited reference knowledge. To address these challenges, we propose LKD-KGC, a novel framework for unsupervised domain-specific KG construction. LKD-KGC autonomously analyzes document repositories to infer knowledge dependencies, determines optimal processing sequences via LLM driven prioritization, and autoregressively generates entity schema by integrating hierarchical inter-document contexts. This schema guides the unsupervised extraction of entities and relationships, eliminating reliance on predefined structures or external knowledge. Extensive experiments show that compared with state-of-the-art baselines, LKD-KGC generally achieves improvements of 10% to 20% in both precision and recall rate, demonstrating its potential in constructing high-quality domain-specific KGs.

Abstract PDF Upgrade to Chat

Summary

The paper introduces LKD-KGC, a framework that autonomously constructs domain-specific knowledge graphs using hierarchical dependency parsing.
It achieves improved entity and relation extraction precision and recall through context-aware schema generation and triple extraction without predefined schemas.
The framework demonstrates robustness across diverse datasets, outperforming state-of-the-art methods in both public and private corpora.

Domain-Specific KG Construction via LLM-driven Knowledge Dependency Parsing

Introduction

This essay describes the LKD-KGC framework, which addresses challenges in constructing domain-specific knowledge graphs (KGs) using LLMs. The framework introduces a method for unsupervised KG construction, systematically incorporating hierarchical knowledge dependency parsing to enhance precision and recall in entity and relationship extraction. Domain-specific corpora demand specialized techniques, given their complex terminology, hierarchical knowledge, and limited external references. LKD-KGC leverages LLM-driven prioritization and autoregressive schema generation to meet these needs without reliance on predefined schemas or external knowledge.

Framework Overview

LKD-KGC consists of three main components: Dependency Evaluation, Schema Definition, and Triple Extraction, each contributing to an integrated approach for processing domain-specific corpora. This methodology prioritizes global knowledge dependencies through hierarchical document processing (Figure 1).

Figure 1: The directory structure of Prometheus documents, demonstrating latent ideal accessing orders for knowledge graph construction.

Dependency Evaluation

This module determines the optimal order for document processing by generating context-aware summaries that integrate cross-document knowledge. It employs vector retrieval to manage context length constraints in LLMs, ensuring efficient hierarchical knowledge synthesis.

Schema Definition

Entity types are extracted from contextually enriched summaries, after which clustering and semantic deduplication refine the entity schema. Prompts guide LLMs to define comprehensive and coherent entity schemas using retrieved contextual knowledge.

Triple Extraction

Under the guidance of an established entity schema, triples are extracted from original documents based on entity types without predefined relation types. This ensures that only meaningful relationships are constructed into the final KG.

Figure 2: The framework of LKD-KGC, illustrating comprehensive domain-specific KG construction using hierarchical dependency parsing.

Experimental Evaluation

Public Corpora

LKD-KGC was evaluated using public datasets, demonstrating superior precision and recall compared with existing LLM-based methods. For the Prometheus dataset, it achieved significant improvements in extracting valid entities and relations, as depicted by higher recall numbers and precision rates.

Private Corpora

On private corpus datasets like IMS documentation, where external references are unavailable, LKD-KGC excelled by fully utilizing internal document contexts. The dependency parsing allowed for accurate extraction of valid entities and relation triples from complex domain-specific texts.

The experimental results underscored LKD-KGC's capability to outperform state-of-the-art baseline methods across diverse datasets and LLM backbones. These improvements in precision and recall metrics highlight the robustness of the framework in handling intricate knowledge dependencies.

Conclusion

The LKD-KGC framework provides an effective solution for constructing high-quality domain-specific knowledge graphs by exploiting LLM-driven knowledge dependency parsing. Its autonomous capability to infer processing sequences and integrate hierarchical contexts results in improved precision and recall rates. These advancements suggest promising directions for future developments in AI-driven KG construction methodologies, particularly in domains with complex and specialized knowledge requirements.