Bayesian Teaching Enables Probabilistic Reasoning in Large Language Models

Published 21 Mar 2025 in cs.CL and cs.AI | (2503.17523v1)

Abstract: Artificial intelligence systems based on LLMs are increasingly used as agents that interact with users and with the world. To do so successfully, LLMs need to construct internal representations of the world and form probabilistic beliefs about those representations. To provide a user with personalized recommendations, for example, the LLM needs to gradually infer the user's preferences, over the course of multiple interactions. To evaluate whether contemporary LLMs are able to do so, we use the Bayesian inference framework from probability theory, which lays out the optimal way to update an agent's beliefs as it receives new information. We first show that the LLMs do not update their beliefs as expected from the Bayesian framework, and that consequently their predictions do not improve as expected as more information becomes available, even less so than we find is the case for humans. To address this issue, we teach the LLMs to reason in a Bayesian manner by training them to mimic the predictions of an optimal Bayesian model. We find that this approach not only significantly improves the LLM's performance on the particular recommendation task it is trained on, but also enables generalization to other tasks. This suggests that this method endows the LLM with broader Bayesian reasoning skills. More generally, our results indicate that LLMs can learn about reasoning strategies effectively and generalize those skills to new domains, which in part explains LLMs' empirical success.

Abstract PDF Upgrade to Chat

Summary

Bayesian Teaching Enables Probabilistic Reasoning in LLMs

In the paper "Bayesian Teaching Enables Probabilistic Reasoning in LLMs," the authors tackle a critical aspect of artificial intelligence: the ability of LLMs to perform probabilistic reasoning through Bayesian updates. While LLMs have demonstrated remarkable capabilities in generating coherent text and answering questions, the authors identify a shortfall in their ability to update beliefs in a manner aligned with Bayesian inference when tasked with interactive recommendations.

The research focuses on equipping LLMs with probabilistic reasoning skills via Bayesian teaching. Initially, LLMs are shown to underperform when tasked with the flight recommendation scenario, a simulated set-up where understanding user preferences over repeated interactions is necessary. Despite having capabilities in language generation and understanding, LLMs like Gemma 2 9B and Gemini 1.5 Pro falter in extracting and updating probabilistic beliefs from interactions. They show a notable plateau in performance after minimal interactions, hinting at their limited intrinsic probabilistic reasoning capabilities.

To address these limitations, the authors introduce Bayesian teaching, contrasting it against oracle teaching. Bayesian teaching involves fine-tuning the LLMs to emulate a model-based on Bayesian inference that positively influences both the assistant's accuracy in subsequent recommendations and its predictive consistency with Bayesian updates. Importantly, this method demonstrates robustness across varying scenarios, including generalized tasks outside the initial training scope. By training on interactions between users and a Bayesian assistant, fine-tuned LLMs not only improve in recommendation tasks but also adapt to different domains like hotel recommendations and web shopping.

Fine-tuning with Bayesian methods significantly outpaces performance achieved through direct oracle teaching, emphasizing the inherent value of incorporating belief structures rather than mere outcomes. The method empowers the system with generalized probabilistic reasoning capabilities, improving both explicit recommendation performance and the system's ability to verbalize inferred preferences accurately.

Theoretical implications point towards an enhancement in the bridge between symbolic models and neural networks, highlighting that LLMs are receptive to learning structured, domain-general strategies. Practically, this advance allows LLMs to become more effective in domains where explicit task coding in Bayesian terms is challenging.

Future research could build on these findings by exploring more complex and real-world multi-feature interaction scenarios, further cementing the practical applications of Bayesian teaching in the continuous learning landscape of AI systems. Integrating deeper Bayesian strategies could foster dynamic and adaptable AI agents that not only interact but thrive in environments replete with uncertainty and partial observability.

Markdown Report Issue