KG-DF: Defense Framework for Knowledge Graphs
- KG-DF is a framework that combines structured knowledge representations with privacy and security mechanisms to defend against vulnerabilities in federated KG embeddings and LLM applications.
- It employs adaptive differential privacy, selective gradient perturbation, and semantic parsing to balance protection and model utility in diverse scenarios.
- Empirical evaluations show KG-DF effectively reduces attack success rates while maintaining high generality and performance in both federated and language model contexts.
A Knowledge Graph Defense Framework (KG-DF) integrates structured knowledge representations and privacy-preserving mechanisms to address vulnerabilities in both federated knowledge graph embedding (FKGE) and LLM applications. In federated scenarios, KG-DF targets privacy threats arising during collaborative embedding learning, while in LLM jailbreak defense, KG-DF leverages knowledge graphs and semantic parsing for black-box mitigation of adversarial prompts. Key instantiations of KG-DF are DP-Flames (differentially private FKGE) (Hu et al., 2023) and the semantic KG-based defense for LLMs (Liu et al., 9 Nov 2025), each illustrating distinct technical foundations but sharing common defense principles—exploitation of KG structure, private selection, and adaptive mechanisms.
1. Principles of KG-DF: Privacy and Security in Knowledge-Driven Systems
KG-DF is predicated on two main pillars: privacy defense in federated KG embedding models, and security enhancement in LLMs exposed to adversarial jailbreak prompts. In FKGE, the principal privacy threat is membership inference via exchanged model updates, wherein attackers infer the presence of specific KG triples in the local client’s data. Traditional differentially private approaches applied naively (e.g., DP-SGD) significantly degrade utility in large KGE models due to the high dimensionality and gradient sparsity.
For LLM jailbreak defense, KG-DF utilizes the semantic association capabilities of KGs to identify unsafe intentions in user queries. This is accomplished by extracting core concepts from queries, matching them against triples in an external KG, and constructing warning contexts supplied to the LLM. The black-box nature of the LLM is preserved, with no requirement for access to model internals.
2. DP-Flames: Differential Privacy for Federated KG Embedding
DP-Flames exemplifies privacy-preserving FKGE defense and operates as follows:
- Model Structure: Each client retains a local KG , with model-wide embedding matrices (entities) and (relations). Training entails collaborative updates of and , with only matrices exchanged to maintain data locality.
- Entity-Binding Gradient Sparsity: For any triple , only the embeddings for and have nonzero gradient contributions during loss computation, yielding extreme sparsity ( active rows per batch of size ).
- Private Selection (PTR/Report-Noisy-Max): To avoid direct revelation of active entity indices, DP-Flames employs a Gumbel-noised report-noisy-max plus propose–test–release (PTR) mechanism. This selects the top-k gradient rows under differential privacy, incurring an -RDP cost.
- Gradient Perturbation: Only the active gradient block is perturbed via Gaussian noise, with -RDP privacy for each iteration, , .
- Federated Algorithm: Local updates involve clipping the gradients per-triple and per-row, private selection of nonzero blocks, and noise injection. Non-private negative sample gradients are included for model expressivity.
Adaptive Privacy Budget: DP-Flames introduces adaptive noise (privacy budget) allocation based on model convergence signals—noise scale is reduced during later training stages if validation MRR improvement plateaus, optimizing the privacy-utility trade-off.
Theoretical Guarantees: Final -DP guarantees are derived using composition over steps: , . The accountant is tight due to sparsity and selective privacy steps.
3. KG-DF for LLM Jailbreak Defense: Semantic Parsing and KG Reasoning
In the LLM context, KG-DF is designed to defend against jailbreak prompts by preemptively associating queries with “safe knowledge”:
- Architectural Modules:
- Semantic Parser: An LLM (GPT-3.5-turbo) transforms the input query into a set of salient, security-related concepts via a keyword-extraction prompt.
- KG Search & Matching: Both and KG triples are embedded into vector space (Qwen3-Embedding); matching is performed by cosine similarity .
- Safe Reasoning Path Generator: Optionally, multi-hop traversal builds “reasoning paths” from matched triples, using BFS to connect related safety concepts.
- LLM Interface: The retrieved safe triples (or reasoning paths) are prepended as a “warning” context to the original prompt, guiding the LLM’s output.
- Formulas and Notation:
A plausible implication is that semantic parsing robustness is integral to the KG-DF’s effectiveness, as it outperforms rule-based baselines (NER, TF-IDF) in handling diverse attack queries.
4. Empirical Evaluation and Trade-off Analysis
DP-Flames Results (FKGE):
- Datasets: FB15k-237, NELL-955; Models: TransE, RotatE, DistMult, ComplEx.
- Attack F1: Baseline is 0.83; naive DP-SGD drops F1 to 0.45 but MRR .
- DP-Flames (fixed ): F1 0.59, MRR 0.18.
- DP-Flames-Adp (adaptive ): F1 0.59, MRR 0.32 (TransE@16), approaching non-DP MRR=0.36.
KG-DF for LLM Defense:
- Evaluation on adversarial benchmarks (Advbench, XSTest) and models (Vicuna-7B, LLaMA2-7B, DeepSeek-LLM-7B, GPT-3.5-Turbo, GPT-4).
- Metrics: Attack Success Rate (ASR), False Positive Rate (FPR), Generality (QA).
- For Vicuna-7B: No-Defense ASR=88%, ASR=88%; KG-DF ASR drops to 0%/6% on GCG/PAIR, FPR=5%, Generality88%.
- On closed models, KG-DF achieves 0% ASR on TAP/PAIR, FPR3–5%, and generality 86–89%.
Trade-off Table: LLM Defense Example
| Model | Defense Method | ASR_GCG | ASR_PAIR | FPR | Generality |
|---|---|---|---|---|---|
| Vicuna-7B | No-Defense | 88% | 88% | — | 76% |
| Vicuna-7B | KG-DF | 0% | 6% | 5% | 88% |
KG-DF demonstrates superior performance in reducing attack success rates while maintaining high QA generality compared to baseline defenses.
5. Implementation, Overhead, and Practical Considerations
For DP-Flames (FKGE):
- Hyperparameters: batch , embedding dim=128, , , start=1.0, , , .
- Computational Overhead: Private selection introduces sorting on rows and Gaussian draws, contributing extra compute per round.
- Negative Sampling: Uses fully random negatives from a public KG, preserving gradient sparsity (or alternatively DP-synthesized negatives).
- Adaptive (privacy budget) requires small public validation set; private selection failure () leads to update skipping for that iteration.
For KG-DF (LLM):
- Each prompt incurs LLM-based parsing/embedding latency ( s/prompt).
- KG extension is immediate: new triples or safety categories can be integrated for evolving attack paradigms.
- Multi-hop path interpretability remains a limitation endorsing further research.
6. Generalization and Extensions of KG-DF
DP-Flames demonstrates a pattern for general KG-DF instantiation: - Leverage model-specific structural sparsity to focus privacy mechanisms. - Apply private selection to restrict the DP domain to active parameters, minimizing unnecessary noise. - Use an RDP accountant for tight theoretical guarantees. - Adapt privacy budget or defense strength dynamically according to model progression or detected risk. - For LLMs, extend KG-DF to new subcategories by KG updates or specialized semantic parsers.
This suggests the KG-DF methodology is extensible to: - Graph neural networks operating on KG subgraphs (e.g., federated link prediction). - Multi-task KG embedding paradigms (e.g., entity alignment, type/class prediction). - Other federated/sparse learning domains, such as recommendation or LLMs focusing on vocabular subsets.
A plausible implication is that, in each applicable domain, careful identification and privatization of the “active” subspace yields markedly improved privacy–utility outcomes compared to blanket DP application.
7. Comparative Analysis and Future Work
KG-DF provides a black-box defense paradigm for LLMs, enabling model-agnostic security guarantees without internal changes:
- Strengths: Black-box applicability, near-zero attack success rates, low false positives, high generality, and rapid extensibility to new safety knowledge.
- Limitations: Parsing/embedding latency, reliance on semantic parser quality, occasional multi-hop path interpretability challenges.
- Future Directions: Specialized parsers to limit API dependence, online KG updates for rapid response to emergent threat strategies, and optimized embedding/indexing for scalability and efficiency.
Empirical evidence from (Hu et al., 2023) and (Liu et al., 9 Nov 2025) establishes KG-DF as a broadly applicable, theoretically sound framework for privacy and security defense in contemporary knowledge-driven AI systems.