Papers
Topics
Authors
Recent
Search
2000 character limit reached

Unstructured and structured data: Can we have the best of both worlds with large language models?

Published 25 Apr 2023 in cs.DB and cs.CL | (2304.13010v2)

Abstract: This paper presents an opinion on the potential of using LLMs to query on both unstructured and structured data. It also outlines some research challenges related to the topic of building question-answering systems for both types of data.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (24)
  1. To be released.
  2. Towards tracing knowledge in language models back to the training data. In EMNLP, pages 2429–2446, Dec. 2022.
  3. Mathematical Capabilities of ChatGPT, 2023.
  4. Realm: Retrieval-augmented language model pre-training. In ICML, 2020.
  5. T. Haerder and A. Reuter. Principles of transaction-oriented database recovery. ACM Comput. Surv., 15(4):287–317, dec 1983.
  6. A. Y. Halevy and J. Dwivedi-Yu. Learnings from data integration for augmented language models. CoRR, abs/2304.04576, 2023.
  7. Crossing the Structure Chasm. In CIDR, 2003.
  8. Explaining black box predictions and unveiling data artifacts through influence functions. In ACL, pages 5553–5563, July 2020.
  9. Few-shot Learning with Retrieval Augmented Language Models. 2022.
  10. A. Kashefi and T. Mukerji. ChatGPT for Programming Numerical Methods, 2023.
  11. The hateful memes challenge: Detecting hate speech in multimodal memes. In NeurIPS, 2020.
  12. LangChain. Retrieval Question Answering with Sources. https://python.langchain.com/en/latest/modules/chains/index_examples/vector_db_qa_with_sources.html.
  13. Deep entity matching with pre-trained language models. Proc. VLDB Endow., 14(1):50–60, 2020.
  14. OpenAI. text-davinci-003. cited on April 12, 2023, Playground at https://platform.openai.com/playground, info on training data at https://help.openai.com/en/articles/6643408-how-do-davinci-and-text-davinci-003-differ.
  15. Estimating training data influence by tracing gradient descent. In NeurIPS, volume 33, pages 19920–19930, 2020.
  16. RASAT: Integrating Relational Structures into Pretrained Seq2Seq Model for Text-to-SQL. In EMNLP, 2022.
  17. K. Rose. How Should I Use A.I. Chatbots Like ChatGPT? New York Times (Mar 30, 2023) https://www.nytimes.com/2023/03/30/technology/ai-chatbot-chatgpt-uses-work-life.html.
  18. Querying Large Language Models with SQL, 2023.
  19. PICARD: Parsing incrementally for constrained auto-regressive decoding from language models. In EMNLP, pages 9895–9901, Nov. 2021.
  20. An Independent Evaluation of ChatGPT on Mathematical Word Problems (MWP), 2023.
  21. From Natural Language Processing to Neural Databases. VLDB Endow., 14(6):1033–1039, 2021.
  22. Chain of thought prompting elicits reasoning in large language models. In NeurIPS, 2022.
  23. ReAct: Synergizing reasoning and acting in language models. In ICLR, 2023.
  24. Retrieval-augmented multimodal language modeling, 2022.
Citations (1)

Summary

Paper to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Authors (1)

Collections

Sign up for free to add this paper to one or more collections.