Aligned Query Expansion: Efficient Query Expansion for Information Retrieval through LLM Alignment

Published 15 Jul 2025 in cs.IR | (2507.11042v1)

Abstract: With the breakthroughs in LLMs, query generation techniques that expand documents and queries with related terms are becoming increasingly popular in the information retrieval field. Such techniques have been shown to improve the effectiveness of traditional lexical retrieval methods by dealing with the vocabulary mismatch problem. Recent work has found that generating queries with a greedy decoding strategy can produce sub-optimal queries, including hallucinations, and proposed to filter out queries before expansion. This `generate-then-filter' approach is costly, as it requires generating multiple queries and applying a relevance model to all of them and does not teach the LLM which of the generated queries is more effective for expansion. To overcome such limitations, we propose Aligned Query Expansion (AQE), a novel approach to enhance query expansion for passage retrieval in open-domain question answering. AQE leverages recent techniques in LLM alignment to fine-tune models for generating query expansions that directly optimize the effectiveness of the retrieval task, eliminating the need for additional filtering steps. This alignment ensures that queries are more relevant, reducing computational costs while improving retrieval effectiveness. Empirical evaluations show that AQE outperforms baseline models for query expansion in both in-domain and out-of-domain settings, demonstrating significant improvements in retrieval effectiveness.

Abstract PDF Upgrade to Chat

Summary

The paper introduces Aligned Query Expansion (AQE), which significantly enhances query performance by aligning LLM-generated expansions with retrieval objectives.
Methodology combines zero-shot generation, BM25 ranking, and fine-tuning via RSFT and DPO to optimize both accuracy and semantic diversity.
Experimental results demonstrate improved retrieval accuracy, robust generalization across datasets, and about 70% reductions in latency and memory usage.

Aligned Query Expansion: Enhancing Information Retrieval with LLM Alignment

In the field of information retrieval, the challenge of vocabulary mismatch persists as a significant obstacle to effective search and retrieval. The paper "Aligned Query Expansion: Efficient Query Expansion for Information Retrieval through LLM Alignment" introduces a novel approach—Aligned Query Expansion (AQE)—which leverages the alignment capabilities of LLMs to improve the efficiency and effectiveness of query expansion for passage retrieval.

Introduction to Query Expansion and Breakthroughs with LLMs

The discrepancy between user queries and relevant document terms, known as vocabulary mismatch, can degrade retrieval performance. Traditional methods for query expansion, which include statistical approaches or external resources like thesauri, often fall short due to over-expansion issues. LLMs revolutionize query expansion by generating contextually enriched query variations, mitigating vocabulary mismatch more effectively than conventional strategies.

However, LLMs pose new challenges, particularly concerning hallucinations during query generation. Solutions based on the generate-then-filter paradigm, while addressing these issues, incur computational overhead without guiding LLMs in choosing optimal expansions inherently.

Figure 1: Aligned Query Expansion (AQE) training pipeline. The pipeline begins with the generation of several query expansions using zero-shot prompting of a LLM, followed by ranking the expansions based on their retrieval effectiveness when used in a retrieval model. The top and bottom-ranked expansions are then used to fine-tune the LLM via two alignment strategies: Rejection Sampling Fine-Tuning (RSFT) and Direct Preference Optimization (DPO).

Methodology of Aligned Query Expansion

AQE proposes a streamlined approach, eliminating the need for redundant filtering while enhancing retrieval effectiveness. The methodology comprises three major components: zero-shot query expansion generation, ranking of query expansions, and alignment through fine-tuning.

Zero-Shot Query Expansion Generation

In the AQE framework, query expansions are generated using zero-shot prompting strategies. This step produces diverse query variations, enhancing retrieval by capturing a wide semantic scope. The generation of query expansions follows a prompt-based strategy designed to elicit informative context from the LLM regarding the original query.

Ranking and Alignment Strategies

Once generated, each query expansion is ranked based on its retrieval effectiveness, utilizing sparse retrieval algorithms such as BM25. Effective expansions are identified by the rank of the true document in the returned list. The alignment process employs two strategies: Rejection Sampling Fine-Tuning (RSFT) and Direct Preference Optimization (DPO), each fine-tuning the LLM to prioritize high-quality expansions directly.

Experimental Evaluation

Empirical evaluations demonstrate AQE’s superiority in both in-domain and out-of-domain settings. Key findings include:

Improved Retrieval Accuracy: Evaluations across datasets like Natural Questions and TriviaQA show AQE consistently outperforming baseline models in top-N retrieval accuracy.
Generalization Capabilities: AQE maintains robust performance across diverse datasets, showcasing significant improvements in out-of-distribution evaluations.
Efficiency Gains: AQE reduces latency and computational memory usage by approximately 70%, making it viable for real-time systems where resource constraints are critical.
Figure 2: Comparison of GPU memory occupancy, computational time, and top-1 retrieval accuracy of filtering and AQE when performing inference on TriviaQA.

Insights on Query Diversity

An analysis of query generation diversity reveals that AQE, particularly utilizing RSFT combined with DPO, increases semantic diversity, producing expansions that are beneficial to retrieval tasks. This combined alignment method enhances query diversity, leveraging both RSFT's refined generation and DPO's preference optimization for balance between retrieval effectiveness and diversity.

Figure 3: Diversity of the generated query expansions.

Conclusion

Aligned Query Expansion (AQE) advances information retrieval by employing alignment strategies to optimize query generation without extensive filtering. Empirical results underline AQE’s versatility and generalizability across varied datasets, alongside notable efficiency improvements. Future research can explore further refinement in alignment strategies and extend AQE’s application to complex retrieval tasks within various domains.

Overall, AQE presents a scalable, efficient, and effective solution to the longstanding challenges in query expansion, paving the way for more sophisticated retrieval systems empowered by LLMs.

Markdown Report Issue