Google Researchers Propose Bayesian Teaching Method For Large Language Models

Google Researchers have proposed a training method that teaches large language models to approximate Bayesian reasoning by learning from the predictions of an optimal Bayesian system. The approach focuses on improving how models update beliefs as they receive new information during multi-step interactions.

The study examines how language models update beliefs when interacting with users over time. In many real-world applications, such as recommendation systems, models need to infer user preferences gradually based on new information. Bayesian inference provides a mathematical framework for updating probabilities as new evidence becomes available. The researchers investigated whether language models behave in ways consistent with Bayesian belief updates and explored training methods to improve that behavior.

To evaluate this, the team created a simulated flight recommendation task. In the experiment, a model interacted with a simulated user for five rounds. In each round, the assistant and user were shown three flight options defined by departure time, duration, number of stops, and price. Each simulated user had hidden preferences for these attributes. After each recommendation, the user indicated whether the assistant selected the correct option and revealed the preferred flight. The assistant was expected to use this feedback to improve future recommendations.

The researchers compared several language models with a Bayesian assistant that maintains a probability distribution over possible user preferences and updates it using Bayes’ rule after each interaction. In the experiment, the Bayesian assistant reached about 81% accuracy in selecting the correct option. Language models performed worse and often showed limited improvement after the first interaction, suggesting that they did not effectively update their internal estimates of user preferences.

The study then tested a training approach called Bayesian teaching. Instead of learning only from correct answers, models were trained to imitate the predictions of the Bayesian assistant during simulated interactions. In early rounds, the Bayesian assistant sometimes made incorrect recommendations due to uncertainty about the user’s preferences, but its decisions reflected probabilistic reasoning based on the available evidence.

The image below, shows the recommendation accuracy of Gemma and Qwen after fine-tuning on user interactions with the Bayesian assistant or with an oracle.

The training data for supervised fine-tuning consisted of simulated conversations between users and the Bayesian assistant. For comparison, the researchers tested a method in which the model learned from an assistant that always selected the correct option because it had perfect knowledge of the user’s preferences.

Both fine-tuning approaches improved model performance, but Bayesian teaching produced better results. Models trained with this method made predictions that more closely matched those of the Bayesian assistant and demonstrated stronger improvement across multiple interaction rounds. The trained models also showed higher agreement with the Bayesian system when evaluating user choices.

Community reactions to the Google Research post were largely positive, with commenters highlighting improved probabilistic reasoning and multi-turn adaptation in LLMs.

Software Developer, Yann Kronberg commented:

People talk about reasoning benchmarks but this is basically about belief updates. We know that most LLMs don’t revise their internal assumptions well after new information arrives, so @GoogleResearch teaching them to approximate Bayesian inference could matter a lot for long-running agents.

Some also questioned the use of supervised fine-tuning instead of reinforcement learning for approximating Bayesian inference.

Researcher Aidan Li quoted:

Why did the authors use SFT instead of RL to train the model to approximate probabilistic inference? There is a wealth of work relating RL and probabilistic inference, even for LLMs. Maybe I’m missing something but RL seems like the obvious choice.

The researchers describe the method as a form of model distillation in which a neural network learns to approximate the behavior of a symbolic system implementing Bayesian inference. The results suggest that language models can acquire probabilistic reasoning skills through post-training that demonstrates optimal decision strategies during sequential interactions.