Papers
Topics
Authors
Recent
Search
2000 character limit reached

A Deep Reinforcement Learning Approach for Interactive Search with Sentence-level Feedback

Published 3 Oct 2023 in cs.LG, cs.AI, cs.HC, and cs.IR | (2310.03043v1)

Abstract: Interactive search can provide a better experience by incorporating interaction feedback from the users. This can significantly improve search accuracy as it helps avoid irrelevant information and captures the users' search intents. Existing state-of-the-art (SOTA) systems use reinforcement learning (RL) models to incorporate the interactions but focus on item-level feedback, ignoring the fine-grained information found in sentence-level feedback. Yet such feedback requires extensive RL action space exploration and large amounts of annotated data. This work addresses these challenges by proposing a new deep Q-learning (DQ) approach, DQrank. DQrank adapts BERT-based models, the SOTA in natural language processing, to select crucial sentences based on users' engagement and rank the items to obtain more satisfactory responses. We also propose two mechanisms to better explore optimal actions. DQrank further utilizes the experience replay mechanism in DQ to store the feedback sentences to obtain a better initial ranking performance. We validate the effectiveness of DQrank on three search datasets. The results show that DQRank performs at least 12% better than the previous SOTA RL approaches. We also conduct detailed ablation studies. The ablation results demonstrate that each model component can efficiently extract and accumulate long-term engagement effects from the users' sentence-level feedback. This structure offers new technologies with promised performance to construct a search system with sentence-level interaction.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (38)
  1. A general framework for counterfactual learning-to-rank. In Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval. 5–14.
  2. Unbiased learning to rank with unbiased propensity estimation. In The 41st International ACM SIGIR Conference on Research & Development in Information Retrieval. 385–394.
  3. Universal sentence encoder. arXiv preprint arXiv:1803.11175 (2018).
  4. Olivier Chapelle and S Sathiya Keerthi. 2010. Efficient algorithms for ranking with SVMs. Information retrieval 13, 3 (2010), 201–215.
  5. ORCAS: 20 million clicked query-document pairs for analyzing search. In Proceedings of the 29th ACM International Conference on Information & Knowledge Management. 2983–2989.
  6. Prithiviraj Damodaran. 2021. Parrot: Paraphrase generation for NLU.
  7. A theoretical analysis of deep Q-learning. In Learning for Dynamics and Control. PMLR, 486–489.
  8. TANDA: Transfer and Adapt Pre-Trained Transformer Models for Answer Sentence Selection. (2019). arXiv:1911.04118
  9. Evaluation metrics for measuring bias in search engine results. Information Retrieval Journal 24, 2 (2021), 85–113.
  10. Using self-supervised learning can improve model robustness and uncertainty. Advances in neural information processing systems 32 (2019).
  11. SlateQ: A tractable decomposition for reinforcement learning with recommendation sets. (2019).
  12. Reinforcement learning for slate-based recommender systems: A tractable decomposition and practical methodology. arXiv preprint arXiv:1905.12767 (2019).
  13. Kalervo Järvelin and Jaana Kekäläinen. 2002. Cumulated gain-based evaluation of IR techniques. ACM Transactions on Information Systems (TOIS) 20, 4 (2002), 422–446.
  14. Yoonsuh Jung. 2018. Multiple predicting K-fold cross-validation for model selection. Journal of Nonparametric Statistics 30, 1 (2018), 197–215.
  15. Reinforcement learning: A survey. Journal of artificial intelligence research 4 (1996), 237–285.
  16. Diane Kelly and Jimmy Lin. 2007. Overview of the TREC 2006 ciQA task. In ACM SIGIR Forum, Vol. 41. ACM New York, NY, USA, 107–116.
  17. Rongjia Liu and Zhanxun Dong. 2019. A study of user experience in knowledge-based QA chatbot design. In International Conference on Intelligent Human Systems Integration. Springer, 589–593.
  18. Roberta: A robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692 (2019).
  19. The determination of cluster number at k-mean using elbow method and purity evaluation on headline news. In 2018 international seminar on application for technology of information and communication. IEEE, 533–538.
  20. Ranking sentences for extractive summarization with reinforcement learning. arXiv preprint arXiv:1802.08636 (2018).
  21. MS MARCO: A human generated machine reading comprehension dataset. In CoCo@ NIPs.
  22. Rodrigo Nogueira and Kyunghyun Cho. 2019. Passage Re-ranking with BERT. arXiv preprint arXiv:1901.04085 (2019).
  23. Understanding the Behaviors of BERT in Ranking. arXiv preprint arXiv:1904.07531 (2019).
  24. Michael Rawson and Radu Balan. 2021. Convergence Guarantees for Deep Epsilon Greedy Policy Learning. arXiv preprint arXiv:2112.03376 (2021).
  25. The probabilistic relevance framework: BM25 and beyond. Foundations and Trends® in Information Retrieval 3, 4 (2009), 333–389.
  26. Supporting exploratory search tasks with interactive user modeling. Proceedings of the American Society for Information Science and Technology 50, 1 (2013), 1–10.
  27. Interactive intent modeling for exploratory search. ACM Transactions on Information Systems (TOIS) 36, 4 (2018), 1–46.
  28. DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter. arXiv preprint arXiv:1910.01108 (2019).
  29. Indri: A language model-based search engine for complex queries. In Proceedings of the international conference on intelligent analysis, Vol. 2. Citeseer, 2–6.
  30. Attention is all you need. Advances in neural information processing systems 30 (2017).
  31. A next click recommender system for web-based service analytics with context-aware LSTMs. (2020).
  32. Transformers: State-of-the-Art Natural Language Processing. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations. Association for Computational Linguistics, Online, 38–45. https://www.aclweb.org/anthology/2020.emnlp-demos.6
  33. Listwise approach to learning to rank: theory and algorithm. In Proceedings of the 25th international conference on Machine learning. 1192–1199.
  34. Jun Xu and Hang Li. 2007. Adarank: a boosting algorithm for information retrieval. In Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval. 391–398.
  35. " Deep reinforcement learning for search, recommendation, and online advertising: a survey" by Xiangyu Zhao, Long Xia, Jiliang Tang, and Dawei Yin with Martin Vesely as coordinator. ACM sigweb newsletter Spring (2019), 1–15.
  36. Jianghong Zhou and Eugene Agichtein. 2020. Rlirank: Learning to rank with reinforcement learning for dynamic search. In Proceedings of The Web Conference 2020. 2842–2848.
  37. Diversifying Multi-aspect Search Results Using Simpson’s Diversity Index. In Proceedings of the 29th ACM International Conference on Information & Knowledge Management. 2345–2348.
  38. De-Biased Modeling of Search Click Behavior with Reinforcement Learning. In Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval. 1637–1641.

Summary

No one has generated a summary of this paper yet.

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.