Large Language Models are Biased Reinforcement Learners

Published 19 May 2024 in cs.CL, cs.AI, and cs.LG | (2405.11422v1)

Abstract: In-context learning enables LLMs to perform a variety of tasks, including learning to make reward-maximizing choices in simple bandit tasks. Given their potential use as (autonomous) decision-making agents, it is important to understand how these models perform such reinforcement learning (RL) tasks and the extent to which they are susceptible to biases. Motivated by the fact that, in humans, it has been widely documented that the value of an outcome depends on how it compares to other local outcomes, the present study focuses on whether similar value encoding biases apply to how LLMs encode rewarding outcomes. Results from experiments with multiple bandit tasks and models show that LLMs exhibit behavioral signatures of a relative value bias. Adding explicit outcome comparisons to the prompt produces opposing effects on performance, enhancing maximization in trained choice sets but impairing generalization to new choice sets. Computational cognitive modeling reveals that LLM behavior is well-described by a simple RL algorithm that incorporates relative values at the outcome encoding stage. Lastly, we present preliminary evidence that the observed biases are not limited to fine-tuned LLMs, and that relative value processing is detectable in the final hidden layer activations of a raw, pretrained model. These findings have important implications for the use of LLMs in decision-making applications.

Abstract PDF HTML Upgrade to Chat

References (51)

Citations (1)

View on Semantic Scholar

Summary

The paper reveals that LLMs exhibit human-like biases during reinforcement learning in bandit tasks, favoring options with higher relative rewards.
The study found that prompt design, especially explicit comparisons, boosts training accuracy but can hinder transfer performance in novel contexts.
Computational modeling shows biases persist even in pretrained models, indicating the need for refined fine-tuning and mitigation strategies.

Do LLMs Learn Like Humans in Decision-Making Tasks?

Introduction

LLMs like GPT-3.5 and GPT-4 have shown an impressive range of abilities from language translation to problem-solving. Among these abilities is something called in-context learning, where models can learn to perform new tasks just by observing examples within a given context. This study dives into how LLMs deal with decision-making tasks, particularly ones involving reinforcement learning (RL) under the hood. The focus is on understanding whether these models exhibit human-like biases when encoding and using rewards to make decisions.

Experiment Setup: The Bandit Tasks

To probe the decision-making abilities of LLMs, the researchers employed so-called bandit tasks. These tasks involve making choices from a set of options where each choice results in a reward. The goal is to maximize the cumulative reward over time. Think of it like choosing a slot machine to play out of a set of slot machines to get the maximum payout.

Here's how the experiment was set up:

Models Tested: The researchers tested four popular LLMs, including proprietary ones like GPT-3.5-turbo-0125 and GPT-4-0125-preview, as well as open-source models like Llama-2-70b-chat and Mixtral-8x7b-instruct.
Tasks: Five different bandit tasks were used. Each task had different structural features, like how rewards were distributed and the grouping of options.
Prompt Designs: Two types of prompts were tested: one listing outcomes in a neutral manner (standard prompt) and another adding explicit comparisons between outcomes (comparisons prompt).

Figure \ref{Fig1} in the paper illustrates an example of these bandit tasks with different contexts and prompt designs.

Main Findings

Choice Accuracy

The researchers measured how well the LLMs performed in both the training phase (where feedback was provided) and the transfer test phase (where no feedback was given). Key observations include:

Training Phase: The comparisons prompt generally led to higher accuracy, suggesting that explicit comparisons helped models learn better within the training context.
Transfer Test: Interestingly, the comparisons prompt actually reduced accuracy, indicating a trade-off between learning well in the initial context and generalizing to new contexts.

Relative Value Bias

The study revealed something fascinating: LLMs displayed what’s known as a relative value bias. This means that, like humans, the models tended to favor options that had higher relative rewards in the training context, even if those options were not the best in an absolute sense.

LLMs’ Preference: The researchers found that the models were more biased towards relative value, especially with the comparisons prompt. For example, options that gave better local outcomes were favored even when it wasn't the optimal choice.
Human-Like Bias: This reflects a human-like tendency where subjective rewards depend more on local context than absolute values, possibly leading to sub-optimal decisions in new contexts.

Computational Modeling

To further understand the underlying behavior, the researchers used computational cognitive models. They created models that combined both relative and absolute value signals. The winning models generally included:

Relative Encoding: Incorporating both the absolute value of rewards and their relative standing compared to other options.
Confirmation Bias: The models updated their expectations differently based on whether the outcome confirmed or disconfirmed prior beliefs, mirroring a kind of confirmation bias seen in humans.

Hidden States Analysis

An additional interesting revelation came from examining a pre-trained, non-fine-tuned model (Gemma-7b). This model also showed relative value bias, indicating that such biases are not necessarily introduced during the fine-tuning stages but could be inherent in the way these models are pretrained on massive datasets.

Implications and Future Directions

The study offers several important takeaways and sets the stage for future research:

Decision-Making Applications: The finding that LLMs exhibit human-like biases is crucial for deploying these models in real-world decision-making scenarios.
Fine-Tuning vs. Pretraining: Since biases can appear even in pretrained models, it suggests that strategies beyond fine-tuning need attention.
Mitigation Strategies: Future research should explore methods to counteract these biases, potentially through different prompting strategies or architectural changes.

Exploring the intricate behavior of LLMs not only aids in improving their performance in various tasks but also deepens our understanding of how these models compare to human cognition.

Final Thoughts

The paper sheds light on the nuanced behavior of LLMs in reinforcement learning tasks, emphasizing the importance of understanding and potentially mitigating human-like biases for better autonomous decision-making. As LLMs continue to evolve, keeping an eye on such findings will be crucial to responsibly harness their capabilities.