Backtracing: Retrieving the Cause of the Query
Abstract: Many online content portals allow users to ask questions to supplement their understanding (e.g., of lectures). While information retrieval (IR) systems may provide answers for such user queries, they do not directly assist content creators -- such as lecturers who want to improve their content -- identify segments that caused a user to ask those questions. We introduce the task of backtracing, in which systems retrieve the text segment that most likely caused a user query. We formalize three real-world domains for which backtracing is important in improving content delivery and communication: understanding the cause of (a) student confusion in the Lecture domain, (b) reader curiosity in the News Article domain, and (c) user emotion in the Conversation domain. We evaluate the zero-shot performance of popular information retrieval methods and language modeling methods, including bi-encoder, re-ranking and likelihood-based methods and ChatGPT. While traditional IR systems retrieve semantically relevant information (e.g., details on "projection matrices" for a query "does projecting multiple times still lead to the same point?"), they often miss the causally relevant context (e.g., the lecturer states "projecting twice gets me the same answer as one projection"). Our results show that there is room for improvement on backtracing and it requires new retrieval approaches. We hope our benchmark serves to improve future retrieval systems for backtracing, spawning systems that refine content generation and identify linguistic triggers influencing user queries. Our code and data are open-sourced: https://github.com/rosewang2008/backtracing.
- A full-text learning to rank dataset for medical information retrieval. In Advances in Information Retrieval: 38th European Conference on IR Research, ECIR 2016, Padua, Italy, March 20–23, 2016. Proceedings 38, pages 716–722. Springer.
- Actively predicting diverse search intent from user browsing behaviors. In Proceedings of the 19th international conference on World wide web, pages 221–230.
- Overview of the trec 2020 deep learning track.
- Overview of the trec 2019 deep learning track.
- Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805.
- Question generation for question answering. In Proceedings of the 2017 conference on empirical methods in natural language processing, pages 866–874.
- Warren E Evans and Ronald E Guymon. 1978. Clarity of explanation: A powerful indicator of teacher effectiveness.
- Norbert Fuhr. 2018. Some common mistakes in ir evaluation, and how they can be avoided. In Acm sigir forum, volume 51, pages 32–41. ACM New York, NY, USA.
- Feedback about teaching in higher ed: Neglected opportunities to promote change. CBE—Life Sciences Education, 13(2):187–199.
- A deep relevance matching model for ad-hoc retrieval. In Proceedings of the 25th ACM international on conference on information and knowledge management, pages 55–64.
- Multireqa: A cross-domain evaluation for retrieval question answering models.
- Lee Harvey. 2003. Student feedback [1]. Quality in higher education, 9(1):3–20.
- Nira Hativa. 1998. Lack of clarity in university teaching: A case study. Higher Education, pages 353–381.
- Michael Heilman and Noah A Smith. 2010. Good question! statistical ranking for question generation. In Human language technologies: The 2010 annual conference of the North American Chapter of the Association for Computational Linguistics, pages 609–617.
- Paul W Holland. 1986. Statistics and causal inference. Journal of the American statistical Association, 81(396):945–960.
- Triviaqa: A large scale distantly supervised challenge dataset for reading comprehension. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 1601–1611.
- Dense passage retrieval for open-domain question answering. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 6769–6781, Online. Association for Computational Linguistics.
- Omar Khattab and Matei Zaharia. 2020. Colbert: Efficient and effective passage search via contextualized late interaction over bert. In Proceedings of the 43rd International ACM SIGIR conference on research and development in Information Retrieval, pages 39–48.
- Inquisitive question generation for high level text comprehension. arXiv preprint arXiv:2010.01657.
- Predicting search intent based on pre-search context. In Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 503–512.
- Proactive information retrieval by capturing search intent from primary task context. ACM Transactions on Interactive Intelligent Systems (TiiS), 8(3):1–25.
- MiniChain Library. 2023. MiniChain Library. https://github.com/srush/minichain#typed-prompts. [Online; accessed 4-June-2024].
- Ian McKenzie. 2023. Inverse Scaling Prize: First Round Winners. https://irmckenzie.co.uk/round1#:~:text=model%20should%20answer.-,Using%20newlines,-We%20saw%20many. [Online; accessed 4-June-2024].
- Kathleen E McKone. 1999. Analysis of student feedback improves instructor effectiveness. Journal of Management Education, 23(4):396–415.
- Rodrigo Nogueira and Kyunghyun Cho. 2019. Passage re-ranking with bert. arXiv preprint arXiv:1901.04085.
- Recent advances in neural question generation. arXiv preprint arXiv:1905.08949.
- True few-shot learning with language models. Advances in neural information processing systems, 34:11054–11070.
- Deep contextualized word representations. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), pages 2227–2237, New Orleans, Louisiana. Association for Computational Linguistics.
- Recognizing emotion cause in conversations. Cognitive Computation, 13:1317–1332.
- Language models are unsupervised multitask learners. OpenAI blog, 1(8):9.
- Squad: 100,000+ questions for machine comprehension of text. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, pages 2383–2392.
- Nils Reimers and Iryna Gurevych. 2019a. Sentence-bert: Sentence embeddings using siamese bert-networks. arXiv preprint arXiv:1908.10084.
- Nils Reimers and Iryna Gurevych. 2019b. Sentence-bert: Sentence embeddings using siamese bert-networks.
- Rocketqav2: A joint training method for dense passage retrieval and passage re-ranking. arXiv preprint arXiv:2110.07367.
- Overview of the trec 2021 clinical trials track. In Proceedings of the Thirtieth Text REtrieval Conference (TREC 2021).
- Improving passage retrieval with zero-shot question generation. arXiv preprint arXiv:2204.07496.
- Introduction to information retrieval, volume 39. Cambridge University Press Cambridge.
- Ian Soboroff. 2021. Overview of trec 2021. In 30th Text REtrieval Conference. Gaithersburg, Maryland.
- Trec 2018 news track overview. In TREC, volume 409, page 410.
- Beir: A heterogenous benchmark for zero-shot evaluation of information retrieval models. arXiv preprint arXiv:2104.08663.
- An overview of the bioasq large-scale biomedical semantic indexing and question answering competition. BMC bioinformatics, 16(1):1–28.
- Trec-covid: constructing a pandemic information retrieval test collection. In ACM SIGIR Forum, volume 54, pages 1–12. ACM New York, NY, USA.
- Ellen M Voorhees. 2005. The trec robust retrieval track. In ACM SIGIR Forum, volume 39, pages 11–20. ACM New York, NY, USA.
- Ben Wang and Aran Komatsuzaki. 2021. Gpt-j-6b: A 6 billion parameter autoregressive language model.
- Sight: A large annotated dataset on student insights gathered from higher education transcripts. In Proceedings of Innovative Use of NLP for Building Educational Applications.
- End-to-end neural ad-hoc ranking with kernel pooling. In Proceedings of the 40th International ACM SIGIR conference on research and development in information retrieval, pages 55–64.
- Wikiqa: A challenge dataset for open-domain question answering. In Proceedings of the 2015 conference on empirical methods in natural language processing, pages 2013–2018.
- Hotpotqa: A dataset for diverse, explainable multi-hop question answering. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 2369–2380.
- Opt: Open pre-trained transformer language models.
- Deep query likelihood model for information retrieval. In Advances in Information Retrieval: 43rd European Conference on IR Research, ECIR 2021, Virtual Event, March 28–April 1, 2021, Proceedings, Part II 43, pages 463–470. Springer.
- Shengyao Zhuang and Guido Zuccon. 2021. Tilde: Term independent likelihood model for passage re-ranking. In Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 1483–1492.
- Can large language models transform computational social science? arXiv preprint arXiv:2305.03514.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.