Robust Knowledge Extraction from Large Language Models using Social Choice Theory
Abstract: Large-LLMs can support a wide range of applications like conversational agents, creative writing or general query answering. However, they are ill-suited for query answering in high-stake domains like medicine because they are typically not robust - even the same query can result in different answers when prompted multiple times. In order to improve the robustness of LLM queries, we propose using ranking queries repeatedly and to aggregate the queries using methods from social choice theory. We study ranking queries in diagnostic settings like medical and fault diagnosis and discuss how the Partial Borda Choice function from the literature can be applied to merge multiple query results. We discuss some additional interesting properties in our setting and evaluate the robustness of our approach empirically.
- Handbook of Computational Social Choice. Cambridge University Press. https://doi.org/10.1017/CBO9781107446984
- A Borda count for partially ordered ballots. Social Choice and Welfare 42 (2014), 913–926.
- Hierarchical Neural Story Generation. In Annual Meeting of the Association for Computational Linguistics (ACL). Association for Computational Linguistics, 889–898.
- On calibration of modern neural networks. In International Conference on Machine Learning ICML. PMLR, 1321–1330.
- The Curious Case of Neural Text Degeneration. In International Conference on Learning Representations ICLR.
- Spoken language processing: A guide to theory, algorithm, and system development. Prentice hall PTR.
- Survey of hallucination in natural language generation. Comput. Surveys 55, 12 (2023), 1–38.
- How can we know when language models know? on the calibration of language models for question answering. Transactions of the Association for Computational Linguistics 9 (2021), 962–977.
- Language models (mostly) know what they know. Findings of the Association for Computational Linguistics (2023), 8653––8665.
- Maurice G Kendall. 1938. A new measure of rank correlation. Biometrika 30, 1/2 (1938), 81–93.
- Semantic Uncertainty: Linguistic Invariances for Uncertainty Estimation in Natural Language Generation. In International Conference on Learning Representations ICLR.
- BioBERT: a pre-trained biomedical language representation model for biomedical text mining. Bioinformatics 36, 4 (2020), 1234–1240.
- Teaching Models to Express Their Uncertainty in Words. Transactions on Machine Learning Research (2022).
- Reducing Conversational Agents’ Overconfidence Through Linguistic Calibration. Transactions of the Association for Computational Linguistics 10 (2022).
- Med-BERT: pretrained contextualized embeddings on large-scale structured electronic health records for disease prediction. NPJ digital medicine 4, 1 (2021), 86.
- Nils Reimers and Iryna Gurevych. 2019. Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks. In Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing EMNLP-IJCNLP. 3982–3992.
- Just ask for calibration: Strategies for eliciting calibrated confidence scores from language models fine-tuned with human feedback. Conference on Empirical Methods in Natural Language Processing EMNLP (2023).
- Can LLMs Express Their Uncertainty? An Empirical Evaluation of Confidence Elicitation in LLMs. arXiv preprint arXiv:2306.13063 (2023).
- Jerrold H Zar. 2005. Spearman rank correlation. Encyclopedia of Biostatistics 7 (2005).
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.