- The paper introduces Aligned Query Expansion (AQE), which significantly enhances query performance by aligning LLM-generated expansions with retrieval objectives.
- Methodology combines zero-shot generation, BM25 ranking, and fine-tuning via RSFT and DPO to optimize both accuracy and semantic diversity.
- Experimental results demonstrate improved retrieval accuracy, robust generalization across datasets, and about 70% reductions in latency and memory usage.
In the field of information retrieval, the challenge of vocabulary mismatch persists as a significant obstacle to effective search and retrieval. The paper "Aligned Query Expansion: Efficient Query Expansion for Information Retrieval through LLM Alignment" introduces a novel approach—Aligned Query Expansion (AQE)—which leverages the alignment capabilities of LLMs to improve the efficiency and effectiveness of query expansion for passage retrieval.
Introduction to Query Expansion and Breakthroughs with LLMs
The discrepancy between user queries and relevant document terms, known as vocabulary mismatch, can degrade retrieval performance. Traditional methods for query expansion, which include statistical approaches or external resources like thesauri, often fall short due to over-expansion issues. LLMs revolutionize query expansion by generating contextually enriched query variations, mitigating vocabulary mismatch more effectively than conventional strategies.
However, LLMs pose new challenges, particularly concerning hallucinations during query generation. Solutions based on the generate-then-filter paradigm, while addressing these issues, incur computational overhead without guiding LLMs in choosing optimal expansions inherently.
Figure 1: Aligned Query Expansion (AQE) training pipeline. The pipeline begins with the generation of several query expansions using zero-shot prompting of a LLM, followed by ranking the expansions based on their retrieval effectiveness when used in a retrieval model. The top and bottom-ranked expansions are then used to fine-tune the LLM via two alignment strategies: Rejection Sampling Fine-Tuning (RSFT) and Direct Preference Optimization (DPO).
Methodology of Aligned Query Expansion
AQE proposes a streamlined approach, eliminating the need for redundant filtering while enhancing retrieval effectiveness. The methodology comprises three major components: zero-shot query expansion generation, ranking of query expansions, and alignment through fine-tuning.
Zero-Shot Query Expansion Generation
In the AQE framework, query expansions are generated using zero-shot prompting strategies. This step produces diverse query variations, enhancing retrieval by capturing a wide semantic scope. The generation of query expansions follows a prompt-based strategy designed to elicit informative context from the LLM regarding the original query.
Ranking and Alignment Strategies
Once generated, each query expansion is ranked based on its retrieval effectiveness, utilizing sparse retrieval algorithms such as BM25. Effective expansions are identified by the rank of the true document in the returned list. The alignment process employs two strategies: Rejection Sampling Fine-Tuning (RSFT) and Direct Preference Optimization (DPO), each fine-tuning the LLM to prioritize high-quality expansions directly.
Experimental Evaluation
Empirical evaluations demonstrate AQE’s superiority in both in-domain and out-of-domain settings. Key findings include:
Insights on Query Diversity
An analysis of query generation diversity reveals that AQE, particularly utilizing RSFT combined with DPO, increases semantic diversity, producing expansions that are beneficial to retrieval tasks. This combined alignment method enhances query diversity, leveraging both RSFT's refined generation and DPO's preference optimization for balance between retrieval effectiveness and diversity.
Figure 3: Diversity of the generated query expansions.
Conclusion
Aligned Query Expansion (AQE) advances information retrieval by employing alignment strategies to optimize query generation without extensive filtering. Empirical results underline AQE’s versatility and generalizability across varied datasets, alongside notable efficiency improvements. Future research can explore further refinement in alignment strategies and extend AQE’s application to complex retrieval tasks within various domains.
Overall, AQE presents a scalable, efficient, and effective solution to the longstanding challenges in query expansion, paving the way for more sophisticated retrieval systems empowered by LLMs.