Papers
Topics
Authors
Recent
Search
2000 character limit reached

A Setwise Approach for Effective and Highly Efficient Zero-shot Ranking with Large Language Models

Published 14 Oct 2023 in cs.IR and cs.AI | (2310.09497v2)

Abstract: We propose a novel zero-shot document ranking approach based on LLMs: the Setwise prompting approach. Our approach complements existing prompting approaches for LLM-based zero-shot ranking: Pointwise, Pairwise, and Listwise. Through the first-of-its-kind comparative evaluation within a consistent experimental framework and considering factors like model size, token consumption, latency, among others, we show that existing approaches are inherently characterised by trade-offs between effectiveness and efficiency. We find that while Pointwise approaches score high on efficiency, they suffer from poor effectiveness. Conversely, Pairwise approaches demonstrate superior effectiveness but incur high computational overhead. Our Setwise approach, instead, reduces the number of LLM inferences and the amount of prompt token consumption during the ranking procedure, compared to previous methods. This significantly improves the efficiency of LLM-based zero-shot ranking, while also retaining high zero-shot ranking effectiveness. We make our code and results publicly available at \url{https://github.com/ielab/LLM-rankers}.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (29)
  1. Large language models are zero-shot clinical information extractors. arXiv preprint arXiv:2205.12689 (2022).
  2. Language models are few-shot learners. Advances in neural information processing systems 33 (2020), 1877–1901.
  3. Palm: Scaling language modeling with pathways. arXiv preprint arXiv:2204.02311 (2022).
  4. Overview of the TREC 2020 deep learning track. arXiv preprint arXiv:2102.07662 (2021).
  5. Overview of the TREC 2019 deep learning track. arXiv preprint arXiv:2003.07820 (2020).
  6. Promptbreeder: Self-Referential Self-Improvement Via Prompt Evolution. arXiv preprint arXiv:2309.16797 (2023).
  7. Sparse Pairwise Re-Ranking with Pre-Trained Transformers. In Proceedings of the 2022 ACM SIGIR International Conference on Theory of Information Retrieval (Madrid, Spain) (ICTIR ’22). Association for Computing Machinery, New York, NY, USA, 72–80. https://doi.org/10.1145/3539813.3545140
  8. Donald Ervin Knuth. 1997. The art of computer programming. Vol. 3. Pearson Education.
  9. Large language models are zero-shot reasoners. Advances in neural information processing systems 35 (2022), 22199–22213.
  10. Holistic evaluation of language models. arXiv preprint arXiv:2211.09110 (2022).
  11. Pyserini: A Python Toolkit for Reproducible Information Retrieval Research with Sparse and Dense Representations. In Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval (Virtual Event, Canada) (SIGIR ’21). Association for Computing Machinery, New York, NY, USA, 2356–2362. https://doi.org/10.1145/3404835.3463238
  12. Zero-Shot Listwise Document Reranking with a Large Language Model. arXiv preprint arXiv:2305.02156 (2023).
  13. Active Sampling for Pairwise Comparisons via Approximate Message Passing and Information Gain Maximization. In 2020 IEEE International Conference on Pattern Recognition (ICPR).
  14. Document Ranking with a Pretrained Sequence-to-Sequence Model. In Findings of the Association for Computational Linguistics: EMNLP 2020. 708–718.
  15. Jay M Ponte and W Bruce Croft. 2017. A language modeling approach to information retrieval. In ACM SIGIR Forum, Vol. 51. ACM New York, NY, USA, 202–208.
  16. The expando-mono-duo design pattern for text ranking with pretrained sequence-to-sequence models. arXiv preprint arXiv:2101.05667 (2021).
  17. RankVicuna: Zero-Shot Listwise Document Reranking with Open-Source Large Language Models. arXiv preprint arXiv:2309.15088 (2023).
  18. Large language models are effective text rankers with pairwise ranking prompting. arXiv preprint arXiv:2306.17563 (2023).
  19. Improving Passage Retrieval with Zero-Shot Question Generation. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, Abu Dhabi, United Arab Emirates, 3781–3797. https://doi.org/10.18653/v1/2022.emnlp-main.249
  20. Is ChatGPT Good at Search? Investigating Large Language Models as Re-Ranking Agent. arXiv preprint arXiv:2304.09542 (2023).
  21. BEIR: A Heterogeneous Benchmark for Zero-shot Evaluation of Information Retrieval Models. In Thirty-fifth Conference on Neural Information Processing Systems Datasets and Benchmarks Track (Round 2).
  22. Llama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971 (2023).
  23. Llama 2: Open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288 (2023).
  24. Attention is All You Need. In Proceedings of the 31st International Conference on Neural Information Processing Systems (Long Beach, California, USA) (NIPS’17). Curran Associates Inc., Red Hook, NY, USA, 6000–6010.
  25. Can ChatGPT Write a Good Boolean Query for Systematic Review Literature Search?. In Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval (Taipei, Taiwan) (SIGIR ’23). Association for Computing Machinery, New York, NY, USA, 1426–1436. https://doi.org/10.1145/3539618.3591703
  26. Finetuned Language Models are Zero-Shot Learners. In International Conference on Learning Representations.
  27. Large language models as optimizers. arXiv preprint arXiv:2309.03409 (2023).
  28. Deep query likelihood model for information retrieval. In Advances in Information Retrieval: 43rd European Conference on IR Research, ECIR 2021, Virtual Event, March 28–April 1, 2021, Proceedings, Part II 43. Springer, 463–470.
  29. Shengyao Zhuang and Guido Zuccon. 2021. TILDE: Term independent likelihood moDEl for passage re-ranking. In Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval. 1483–1492.
Citations (12)

Summary

  • The paper introduces a novel setwise prompting method that reduces computational cost while maintaining effectiveness in zero-shot ranking.
  • It employs sorting algorithms and benchmarks from TREC DL and BEIR to demonstrate efficiency improvements over pointwise, pairwise, and listwise strategies.
  • The approach enables cost-effective deployment of large language models in low-resource settings and scalable applications.

A Setwise Approach for Effective and Highly Efficient Zero-shot Ranking with LLMs

Introduction

In the domain of document ranking, LLMs such as GPT-3, FlanT5, and PaLM have shown exceptional capabilities, especially under zero-shot settings. Traditional approaches to utilizing LLMs for zero-shot document ranking have been categorized into pointwise, listwise, and pairwise prompting strategies. These strategies differ in their prompting methods and consequently impact the efficiency and effectiveness of the ranking process. The study introduces a novel "Setwise" prompting approach aimed at optimizing zero-shot document ranking by balancing computational cost and effectiveness.

Evaluation of Existing Approaches

The paper presents a thorough evaluation of pointwise, listwise, and pairwise methods for zero-shot document ranking using a common experimental framework. Key parameters like model size, token usage, and latency were considered. It was found that:

  • Pointwise Strategies: These are highly efficient but lack effectiveness. They typically assess each document individually in comparison to a query.
  • Pairwise Strategies: These approaches compare pairs of documents to determine relevance but at the cost of higher computational demands.
  • Listwise Strategies: By evaluating sets of documents relative to the query, this method potentially balances between effectiveness and computational efficiency but often varies significantly based on configuration and dataset.

The comprehensive evaluation helped elucidate trade-offs intrinsic to each method, providing a clearer pathway for practitioners selecting ranking strategies.

Setwise Prompting: A Novel Approach

The Setwise prompting method is introduced to enhance efficiency while maintaining or even improving effectiveness. It reduces LLM inferences by considering multiple documents concurrently, rather than sequential pairs or lists. This approach leverages sorting algorithms such as heap sort and bubble sort to decrease computational costs and prompt token consumption significantly. Figure 1

Figure 1: Different prompting strategies. (a) Pointwise, (b) Listwise, (c) Pairwise and (d) our proposed Setwise.

Through this method, a relevance estimation is made across sets of documents, thereby providing an efficient mechanism for zero-shot ranking. The empirical tests conducted demonstrate that this method achieves reductions in computational overhead without sacrificing the quality of the ranking results.

Empirical Results

The efficacy of the Setwise approach was validated using the TREC DL datasets and the BEIR benchmarks. Results showed notable reductions in computational costs, specifically in terms of the number of inferences and prompt tokens required per query. Furthermore, Setwise prompting exhibited strong robustness against variations in initial ranking quality, unlike existing methods which are sensitive to initial rankings. Figure 2

Figure 2

Figure 2

Figure 2

Figure 2: Heapify with Pairwise prompting (comparing 2 documents at a time).

The experiments demonstrated the superiority of the Setwise method, especially in balancing the trade-off between effectiveness and computational efficiency. Notably, the use of open-source Flan-t5 LLMs showcased the potential scalability of this approach without depending heavily on expensive, closed-source models.

Implications and Future Directions

The introduction of Setwise prompting into zero-shot document ranking presents practical implications for real-world applications where computational resources and response times are critical. By using fewer inferences and shorter prompts, the approach helps manage costs while maintaining high ranking effectiveness. It opens pathways for deploying LLMs in low-resource settings without compromising on performance.

Future research could explore the application of Setwise prompting with other LLMs, including proprietary models like LLaMA and APIs from OpenAI. Furthermore, enhancements in self-supervised learning and optimization techniques can be integrated to refine the Setwise method further.

Conclusion

This paper offers a significant leap forward in the efficient application of LLMs for zero-shot document ranking by introducing Setwise prompting. By achieving a balance between effectiveness and computational efficiency, the study provides insights and tools essential for leveraging the power of LLMs in scalable and cost-effective ways. The results underscore the method's robustness, making it a valuable addition to existing ranking strategies.

Paper to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 1 tweet with 1 like about this paper.