Knowledge-Aware Query Expansion with Large Language Models for Textual and Relational Retrieval

Published 17 Oct 2024 in cs.CL and cs.IR | (2410.13765v2)

Abstract: LLMs have been used to generate query expansions augmenting original queries for improving information search. Recent studies also explore providing LLMs with initial retrieval results to generate query expansions more grounded to document corpus. However, these methods mostly focus on enhancing textual similarities between search queries and target documents, overlooking document relations. For queries like "Find me a highly rated camera for wildlife photography compatible with my Nikon F-Mount lenses", existing methods may generate expansions that are semantically similar but structurally unrelated to user intents. To handle such semi-structured queries with both textual and relational requirements, in this paper we propose a knowledge-aware query expansion framework, augmenting LLMs with structured document relations from knowledge graph (KG). To further address the limitation of entity-based scoring in existing KG-based methods, we leverage document texts as rich KG node representations and use document-based relation filtering for our Knowledge-Aware Retrieval (KAR). Extensive experiments on three datasets of diverse domains show the advantages of our method compared against state-of-the-art baselines on textual and relational semi-structured retrieval.

Abstract PDF HTML Upgrade to Chat

Summary

The paper introduces the KAR framework, which integrates LLMs with knowledge graphs to enhance query expansion in complex retrieval tasks.
It employs a multi-step process including entity parsing, document retrieval, and relation propagation to incorporate both textual semantics and relational data.
Experimental results on datasets like AMAZON, MAG, and PRIME demonstrate improved metrics such as Hit@1 and MRR, confirming the method’s robustness and scalability.

Knowledge-Aware Query Expansion with LLMs for Textual and Relational Retrieval

This paper introduces a novel method for query expansion using LLMs augmented with knowledge graphs (KGs) to improve information retrieval tasks that require both textual and relational understanding. The method, termed as Knowledge-Aware Retrieval (KAR), aims to overcome limitations of traditional query expansion techniques by incorporating structured document relations alongside semantic text similarities.

Introduction

Traditional query expansion techniques primarily focus on enhancing semantic similarities between search queries and target documents. However, they often overlook document relations which are crucial for retrieving data in real-world scenarios involving complex queries with both textual and relational elements. The proposed knowledge-aware framework integrates LLMs with KG, allowing for a richer and more nuanced query expansion that captures the necessary relational knowledge.

Methodology

The proposed framework consists of several key steps:

Entity Parsing by LLM: An LLM extracts explicitly mentioned entities from the initial query. This process includes considering the original query itself as a pseudo entity to anchor retrieval.
Entity Document Retrieval: For each parsed entity, the associated textual document is retrieved using a text embedding model.
KG Relation Propagation: The method propagates relations in the knowledge graph by identifying h-hop neighbors, thus capturing the relational context of each entity.
Document-based Relation Filtering: Utilizing document texts as KG node representations allows for more accurate filtering of query-focused relations versus the traditional entity-name-based approach.
Knowledge-Aware Expansion: Using LLMs, the framework generates query expansions by synthesizing information from document triples constructed from filtered relations.
Figure 1: Overview of our knowledge-aware query expansion framework illustrated with an example academic paper search query with textual and relational requirements.

Experimental Evaluation

KAR was evaluated on three datasets from diverse domains within the STARK benchmark: product search (AMAZON), academic paper search (MAG), and precision medicine inquiries (PRIME). The method exhibited superior retrieval performance compared to traditional LLM-based and knowledge-enhanced query expansion techniques, achieving improved metrics such as Hit@1 and MRR across all datasets.

Figure 2: Influence of different values of $k$ for filtered top- $k$ neighbors in KAR.

Figure 3 and Figure 4 further illustrate the performance dependencies on model parameters such as top-k neighbors and the number of sampled query expansions, demonstrating the robustness and scalability of the proposed method.

Figure 3: Influence of sampled query expansions $n$ .

Figure 4: Latency comparison of query expansions.

Discussion

The integration of KGs into LLMs for query expansion addresses both textual and relational gaps, essential for accurate information retrieval in complex query scenarios. The zero-shot capabilities and the minimal additional latency introduced by the KAR framework make it adaptable and scalable for various real-world applications.

Conclusion

The knowledge-aware query expansion method effectively leverages the strengths of LLMs and KGs, providing enhancements for traditional information retrieval across varied datasets with semi-structured data requirements. By bridging semantic and structural gaps, KAR sets a strong foundation for future developments in AI-enhanced retrieval systems.

In summary, KAR demonstrates a promising path forward in addressing the challenging task of complex query expansions by coupling textual and relational knowledge, highlighting the potential of LLM and KG integration for future AI advancements.

Markdown Report Issue