Causal origins of implicit bias in large language models
Investigate and determine why the implicit biases elicited by characteristic-based cues in the ImplicitBBQ benchmark arise in large language models, specifically by tracing and quantifying the contributions of pretraining data, supervised fine-tuning, and alignment procedures.
References
Our study identifies how much implicit bias manifests but does not trace it to pretraining data, fine-tuning, or alignment procedures: understanding why these biases arise remains open.
— ImplicitBBQ: Benchmarking Implicit Bias in Large Language Models through Characteristic Based Cues
(2604.01925 - Vedula et al., 2 Apr 2026) in Section 7 (Limitations and Future Work)