Papers
Topics
Authors
Recent
Search
2000 character limit reached

Backtracing: Retrieving the Cause of the Query

Published 6 Mar 2024 in cs.IR and cs.CL | (2403.03956v1)

Abstract: Many online content portals allow users to ask questions to supplement their understanding (e.g., of lectures). While information retrieval (IR) systems may provide answers for such user queries, they do not directly assist content creators -- such as lecturers who want to improve their content -- identify segments that caused a user to ask those questions. We introduce the task of backtracing, in which systems retrieve the text segment that most likely caused a user query. We formalize three real-world domains for which backtracing is important in improving content delivery and communication: understanding the cause of (a) student confusion in the Lecture domain, (b) reader curiosity in the News Article domain, and (c) user emotion in the Conversation domain. We evaluate the zero-shot performance of popular information retrieval methods and language modeling methods, including bi-encoder, re-ranking and likelihood-based methods and ChatGPT. While traditional IR systems retrieve semantically relevant information (e.g., details on "projection matrices" for a query "does projecting multiple times still lead to the same point?"), they often miss the causally relevant context (e.g., the lecturer states "projecting twice gets me the same answer as one projection"). Our results show that there is room for improvement on backtracing and it requires new retrieval approaches. We hope our benchmark serves to improve future retrieval systems for backtracing, spawning systems that refine content generation and identify linguistic triggers influencing user queries. Our code and data are open-sourced: https://github.com/rosewang2008/backtracing.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (52)
  1. A full-text learning to rank dataset for medical information retrieval. In Advances in Information Retrieval: 38th European Conference on IR Research, ECIR 2016, Padua, Italy, March 20–23, 2016. Proceedings 38, pages 716–722. Springer.
  2. Actively predicting diverse search intent from user browsing behaviors. In Proceedings of the 19th international conference on World wide web, pages 221–230.
  3. Overview of the trec 2020 deep learning track.
  4. Overview of the trec 2019 deep learning track.
  5. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805.
  6. Question generation for question answering. In Proceedings of the 2017 conference on empirical methods in natural language processing, pages 866–874.
  7. Warren E Evans and Ronald E Guymon. 1978. Clarity of explanation: A powerful indicator of teacher effectiveness.
  8. Norbert Fuhr. 2018. Some common mistakes in ir evaluation, and how they can be avoided. In Acm sigir forum, volume 51, pages 32–41. ACM New York, NY, USA.
  9. Feedback about teaching in higher ed: Neglected opportunities to promote change. CBE—Life Sciences Education, 13(2):187–199.
  10. A deep relevance matching model for ad-hoc retrieval. In Proceedings of the 25th ACM international on conference on information and knowledge management, pages 55–64.
  11. Multireqa: A cross-domain evaluation for retrieval question answering models.
  12. Lee Harvey. 2003. Student feedback [1]. Quality in higher education, 9(1):3–20.
  13. Nira Hativa. 1998. Lack of clarity in university teaching: A case study. Higher Education, pages 353–381.
  14. Michael Heilman and Noah A Smith. 2010. Good question! statistical ranking for question generation. In Human language technologies: The 2010 annual conference of the North American Chapter of the Association for Computational Linguistics, pages 609–617.
  15. Paul W Holland. 1986. Statistics and causal inference. Journal of the American statistical Association, 81(396):945–960.
  16. Triviaqa: A large scale distantly supervised challenge dataset for reading comprehension. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 1601–1611.
  17. Dense passage retrieval for open-domain question answering. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 6769–6781, Online. Association for Computational Linguistics.
  18. Omar Khattab and Matei Zaharia. 2020. Colbert: Efficient and effective passage search via contextualized late interaction over bert. In Proceedings of the 43rd International ACM SIGIR conference on research and development in Information Retrieval, pages 39–48.
  19. Inquisitive question generation for high level text comprehension. arXiv preprint arXiv:2010.01657.
  20. Predicting search intent based on pre-search context. In Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 503–512.
  21. Proactive information retrieval by capturing search intent from primary task context. ACM Transactions on Interactive Intelligent Systems (TiiS), 8(3):1–25.
  22. MiniChain Library. 2023. MiniChain Library. https://github.com/srush/minichain#typed-prompts. [Online; accessed 4-June-2024].
  23. Ian McKenzie. 2023. Inverse Scaling Prize: First Round Winners. https://irmckenzie.co.uk/round1#:~:text=model%20should%20answer.-,Using%20newlines,-We%20saw%20many. [Online; accessed 4-June-2024].
  24. Kathleen E McKone. 1999. Analysis of student feedback improves instructor effectiveness. Journal of Management Education, 23(4):396–415.
  25. Rodrigo Nogueira and Kyunghyun Cho. 2019. Passage re-ranking with bert. arXiv preprint arXiv:1901.04085.
  26. Recent advances in neural question generation. arXiv preprint arXiv:1905.08949.
  27. True few-shot learning with language models. Advances in neural information processing systems, 34:11054–11070.
  28. Deep contextualized word representations. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), pages 2227–2237, New Orleans, Louisiana. Association for Computational Linguistics.
  29. Recognizing emotion cause in conversations. Cognitive Computation, 13:1317–1332.
  30. Language models are unsupervised multitask learners. OpenAI blog, 1(8):9.
  31. Squad: 100,000+ questions for machine comprehension of text. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, pages 2383–2392.
  32. Nils Reimers and Iryna Gurevych. 2019a. Sentence-bert: Sentence embeddings using siamese bert-networks. arXiv preprint arXiv:1908.10084.
  33. Nils Reimers and Iryna Gurevych. 2019b. Sentence-bert: Sentence embeddings using siamese bert-networks.
  34. Rocketqav2: A joint training method for dense passage retrieval and passage re-ranking. arXiv preprint arXiv:2110.07367.
  35. Overview of the trec 2021 clinical trials track. In Proceedings of the Thirtieth Text REtrieval Conference (TREC 2021).
  36. Improving passage retrieval with zero-shot question generation. arXiv preprint arXiv:2204.07496.
  37. Introduction to information retrieval, volume 39. Cambridge University Press Cambridge.
  38. Ian Soboroff. 2021. Overview of trec 2021. In 30th Text REtrieval Conference. Gaithersburg, Maryland.
  39. Trec 2018 news track overview. In TREC, volume 409, page 410.
  40. Beir: A heterogenous benchmark for zero-shot evaluation of information retrieval models. arXiv preprint arXiv:2104.08663.
  41. An overview of the bioasq large-scale biomedical semantic indexing and question answering competition. BMC bioinformatics, 16(1):1–28.
  42. Trec-covid: constructing a pandemic information retrieval test collection. In ACM SIGIR Forum, volume 54, pages 1–12. ACM New York, NY, USA.
  43. Ellen M Voorhees. 2005. The trec robust retrieval track. In ACM SIGIR Forum, volume 39, pages 11–20. ACM New York, NY, USA.
  44. Ben Wang and Aran Komatsuzaki. 2021. Gpt-j-6b: A 6 billion parameter autoregressive language model.
  45. Sight: A large annotated dataset on student insights gathered from higher education transcripts. In Proceedings of Innovative Use of NLP for Building Educational Applications.
  46. End-to-end neural ad-hoc ranking with kernel pooling. In Proceedings of the 40th International ACM SIGIR conference on research and development in information retrieval, pages 55–64.
  47. Wikiqa: A challenge dataset for open-domain question answering. In Proceedings of the 2015 conference on empirical methods in natural language processing, pages 2013–2018.
  48. Hotpotqa: A dataset for diverse, explainable multi-hop question answering. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 2369–2380.
  49. Opt: Open pre-trained transformer language models.
  50. Deep query likelihood model for information retrieval. In Advances in Information Retrieval: 43rd European Conference on IR Research, ECIR 2021, Virtual Event, March 28–April 1, 2021, Proceedings, Part II 43, pages 463–470. Springer.
  51. Shengyao Zhuang and Guido Zuccon. 2021. Tilde: Term independent likelihood model for passage re-ranking. In Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 1483–1492.
  52. Can large language models transform computational social science? arXiv preprint arXiv:2305.03514.
Citations (1)

Summary

  • The paper introduces backtracing, a novel task that pinpoints text segments causing user queries in domains like education, news, and conversation.
  • The paper evaluates diverse retrieval methods, including bi-encoder and GPT-3.5-turbo-16k, revealing modest accuracies and domain-specific challenges.
  • The paper calls for innovative models designed to capture causal relationships in text, highlighting the need for improved query understanding in IR systems.

Analyzing the Causes Behind Queries: A Study on Backtracing for Content Improvement

Introduction to Backtracing

The digital age has vastly expanded access to information, yet understanding and improving the clarity of content remains a significant challenge. This paper introduces a novel task named backtracing, which seeks to bridge this gap by identifying the text segments within a corpus that most likely triggered a user's query. The objective of backtracing is to aid content creators, such as educators and journalists, in pinpointing areas of confusion or interest in their content, based on the queries received from their audience. The research formalizes three domains where backtracing has valuable applications: student confusion in educational lectures, reader curiosity in news articles, and user emotion in conversation transcripts.

Methodological Approach

The study evaluates the performance of various information retrieval (IR) and language modeling methods on the backtracing task across the aforementioned domains. The methods assessed include popular bi-encoder and re-ranking architectures, likelihood-based retrieval methods employing pre-trained LLMs (PLMs), and the advanced gpt-3.5-turbo-16k model with an extended context window capability. These models are benchmarked using datasets specifically designed to encapsulate the challenge of backtracing in real-world scenarios. A key part of the evaluation is understanding how effectively these methods can distinguish sentences causally relevant to the query from those that are merely semantically related.

Insights and Findings

One of the primary insights from this research is the acknowledgement of the considerable room for improvement in existing retrieval methods when applied to the task of backtracing. Notably, the best-performing models achieved modest accuracies, underscoring the complexity of determining causal relevance in text. Furthermore, the performance of the methods varied significantly across domains, suggesting no one-size-fits-all solution exists for backtracing. The evaluation also revealed that for tasks like backtracing, where context and causality are crucial, simpler similarity-based methods and even sophisticated models like gpt-3.5-turbo-16k fall short of providing consistently reliable results.

Implications and Future Directions

This paper highlights the nascent stage of development in retrieval systems aimed at understanding the causality behind user queries. The findings signal a compelling need for novel approaches that can more adeptly navigate the nuances of causal relevance within large and complex text corpora. Future research directions might include developing models specifically trained on identifying causal relationships within texts or enhancing the contextual understanding of current PLMs.

Moreover, the study's focus on diverse domains underscores the broad applicability and potential impact of backtracing. By improving the ability of content creators to address areas of ambiguity or interest in their material, backtracing techniques can contribute to enhancing the overall quality and effectiveness of educational resources, news articles, and interpersonal communication.

Conclusion

The introduction of backtracing as a task opens new avenues for research in the field of IR and generative AI. This paper lays the groundwork by establishing a benchmark for backtracing, demonstrating the current limitations of state-of-the-art methods, and pointing towards the pressing need for advanced models capable of discerning causal relationships in text. As we advance, the goal remains clear: to develop tools that not only find the answers users seek but also understand the reasons behind their questions, thereby enabling content creators to tailor and improve their work more effectively.

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We found no open problems mentioned in this paper.

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 10 tweets with 446 likes about this paper.