Language Bias Score (LBS)

Updated 9 February 2026

Language Bias Score (LBS) is a quantitative metric that assesses bias in language models by measuring token attributions and stereotype preferences across languages and modalities.
It computes bias through methods such as token-level attribution using Jensen–Shannon divergence, paired stereotype comparisons, and cross-lingual pass rate differences.
The metric is crucial for diagnosing bias in masked and causal models, supporting fairness evaluations and informing strategies for bias mitigation.

Language Bias Score (LBS) is a quantitative metric for assessing and interpreting bias in LLMs and related multimodal systems. It provides a principled mechanism for measuring both stereotypical and language-specific biases in generated outputs, model predictions, or system decisions. LBS has evolved across multiple research traditions, each formalizing different aspects—token-level attribution, preference between stereotype and anti-stereotype, or comparative consistency across languages and modalities. Its applications span bias attribution in masked and causal models, cross-lingual fairness evaluation, and the mitigation of language priors in image-text association.

1. Core Definitions and Formalisms

The definition of LBS varies according to the bias phenomenon under study and evaluation setting.

a) Token-Level Attribution (Filipino and English LMs)

Gamboa & Lee extend the information-theoretic bias attribution metric of Steinborn et al. by quantifying, for each token $u$ , its directional influence on model bias within a paired stereotype test:

$b(u) = \sqrt{\mathrm{JSD}\bigl(P_{u,\text{more}} \parallel G_u\bigr)} - \sqrt{\mathrm{JSD}\bigl(P_{u,\text{less}} \parallel G_u\bigr)}$

$P_{u,\text{more}}$ / $P_{u,\text{less}}$ : Vocabulary distributions with $u$ masked in more/less-stereotypical contexts.
$G_u$ : One-hot distribution on $u$ .
$\mathrm{JSD}$ : Jensen–Shannon distance.

For agglutinative languages, scores over subword tokens $t_1, \ldots, t_n$ composing $u$ are averaged:

$b(u) = \frac{1}{n} \sum_{i=1}^n b(t_i)$

Interpretational axis: $b(u)<0$ indicates $u$ “pulls” toward the biased sentence; $b(u)>0$ the opposite. Magnitude $|b(u)|$ measures strength of influence (Gamboa et al., 8 Jun 2025).

b) Stereotype Preference Metric (StereoSet)

In the StereoSet paradigm, LBS (originally "Stereotype Score") is computed as the percentage of context association test (CAT) instances where the model assigns higher likelihood to the stereotypical candidate than to the anti-stereotypical one:

$\mathrm{LBS} = 100 \times \frac{1}{N} \sum_{i=1}^N \mathbb{1}\left(p_i^\text{stereo} > p_i^\text{anti}\right)$

Aggregation occurs over domains (gender, profession, race, religion) and target terms, with final LBS being the average across domains (Nadeem et al., 2020).

c) Cross-Lingual and Bilingual LBS

For cross-language consistency (Taiwan Sovereignty Benchmark), LBS compares pass rates for semantically-matched prompts in two languages:

$\mathrm{LBS}_M = \mathrm{Score}_{M,zh} - \mathrm{Score}_{M,en}$

where $\mathrm{Score}_{M,L}$ is the fraction of prompts passed by model $M$ in language $L$ . Absolute values $|\mathrm{LBS}| \geq 0.2$ indicate significant language bias (Ko, 6 Feb 2026).

In multilingual social bias evaluation:

$\mathrm{LBS}_\ell = \alpha\,\widetilde{E}_\ell + (1-\alpha)\,\widetilde{I}_\ell$

where $\widetilde{E}_\ell$ and $\widetilde{I}_\ell$ are min–max normalized explicit (BBQ) and implicit (IAT) bias scores, respectively, aggregated per language $\ell$ (Liang et al., 17 Dec 2025).

2. Methodological Variants and Computation

LBS is computed through distinct procedures tailored to experimental context.

a) Information-Theoretic Attribution (CrowS-Pairs & Subword Aggregation)

For each unmodified token, mask and obtain model output distributions in stereotyping and antistereotyping contexts.
Compute root-JSD to the ground truth for each condition.
Aggregate to word-level via subword averaging if necessary (e.g., in agglutinative languages).
Analyze at token, category, or domain level as appropriate (Gamboa et al., 8 Jun 2025).

b) Paired Stereotype/Anti-Stereotype Likelihood (StereoSet)

For each CAT, calculate model likelihoods for all completions.
Compute binary outcome: stereotyping wins (1) or not (0).
Aggregate over instances, terms, domains for LBS.
Auxiliary metrics: Language Modeling Score (LMS) and Idealized CAT Score (icat) measure model plausibility and “unbiasedness” of preference (Nadeem et al., 2020).

c) Cross-Language Pass Rate Difference

Construct semantically paired prompt sets in each language.
For each model/language, collect outputs and flag failures/censorship as described in the benchmark.
Compute the pass fraction in each language and take their difference as LBS (Ko, 6 Feb 2026).
Can be generalized to more than two languages or paired-language vectors.

d) Explicit/Implicit Bias Aggregation (Multilingual LBS)

Collect explicit bias via standardized social QA (e.g., BBQ benchmark—ambiguous/disambiguated context).
Collect implicit bias via prompt-based Implicit Association Test (IAT).
Normalize scores per language, then aggregate.
Employ statistical resampling (bootstrap) for comparisons (Liang et al., 17 Dec 2025).

3. Empirical Findings and Application Domains

LBS has provided substantive insights into the prevalence, structure, and cross-lingual diversity of model bias.

a) Filipino vs. English Token Bias Patterns

Filipino models exhibit concentrated bias on entity-based, concrete word categories (people, objects, relationships), in contrast to English models where action-oriented words drive strongest attributions. Aggregated CrowS-Pairs LBS for Filipino models typically exceed neutrality (50.00):

Model	Gender	Sexuality	Overall
GPT-2	53.43	68.49	58.82
RoBERTa-Tagalog-Base	53.43	73.97	60.78
Sea-lion-3B	74.81	67.12	72.06
SeaLLMs-v3-7B-Chat	51.14	52.06	51.47

These disparities reflect inflectional morphologies and socio-cultural factors in lexical bias manifestation (Gamboa et al., 8 Jun 2025).

b) Stereotypical Sentence Preference

StereoSet results indicate persistent domain-level LBS values above 60, meaning a majority preference for stereotypical completions across gender, profession, race, and religion (Nadeem et al., 2020).

c) Bilingual and Multilingual Consistency

In Taiwan Sovereignty Benchmarking, LBS values range from perfect consistency (e.g., GPT-4o Mini, LBS = 0.0) to pronounced language skew (e.g., GPT-4o, LBS = –0.2; Claude 3.5 Sonnet, +0.2), with Chinese-origin models often producing disparate outputs by query language. Statistical thresholds (|LBS| ≥ 0.2) guide interpretation as significant (Ko, 6 Feb 2026).

Multilingual bias aggregation (BBQ + IAT) uncovers that Arabic and Spanish yield the highest aggregated LBS, signaling elevated cross-lingual and cross-dimensional stereotyping, while English and Chinese display the lowest values (Liang et al., 17 Dec 2025).

4. Interpretability and Theoretical Implications

LBS operates as a signed, interpretable index of model bias behavior, supporting nuanced analysis:

In information-theoretic token attribution, the sign and magnitude of $b(u)$ enable direct attribution to lexical or subword units.
In paired prompt pass/fail metrics, LBS offers clear diagnostics as to which language or modality supports more/less bias-prone model behavior.
Multidimensional LBS (explicit/implicit) broadens analysis beyond surface-level bias to incorporate latent associations.

The divergence in bias mechanisms (action-/abstract versus role-/entity-based) across linguistic and cultural contexts implies that model pretraining corpus, tokenization, and societal priors have orthogonal effects on observed LBS. For agglutinative languages, subword-level aggregation is necessary to properly attribute bias, revealing interaction between morphological processing and stereotype encoding (Gamboa et al., 8 Jun 2025).

LBS is distinct from, but complementary to, other metrics:

Metric	Definition / Computation	Target Phenomenon
LBS (token attribution, CrowS-Pairs)	Root-JSD between stereotyping/antistereotyping output Dists.	Token-level bias direction
LBS (StereoSet)	Proportion of CATs preferring stereotype	Sentence/domain-level bias
LBS (bilingual)	Pass rate difference (L1–L2)	Language-dependent discrepancy
LMS	Proportion ranking meaningful over unrelated	General language plausibility
icat	Composite of LMS and unbiasedness	Plausibility × debiasing
QAC	Overall consistency × minimum per-language score	Joint reliability

MASS (Multimodal ASsociation Score) is a structurally analogous metric in image-text matching, quantifying and mitigating overreliance on language priors by estimating tokenwise pointwise mutual information between text and image (Chung et al., 20 Jan 2025). While MASS operates in the multimodal space, its analytical strategy echoes LBS philosophy: explicit removal of language-conditioned bias components.

6. Implementation, Generalization, and Limitations

Published implementations furnish scripts for bilingual LBS computation (e.g., openrouter_benchmark.py) with fully automated and override-enabled scoring components (Ko, 6 Feb 2026). Data annotation protocols for prompt translation, semantic alignment, and bootstrap-based confidence intervals for statistical comparisons are standard in cross-lingual studies (Liang et al., 17 Dec 2025).

Current limitations include sensitivity to prompt design, dataset coverage, and the risk of confounding system-level (e.g., censorship, API-layer) interventions with intrinsic model bias. LBS granularity can be enhanced by moving from binary to continuous scoring, and multi-language generalization is achieved by extension to LBS vectors over language pairs. The necessity for explicit morphosyntactic adaptation highlights ongoing challenges in applying standardized bias metrics across diverse linguistic landscapes.

7. Impact and Evolving Directions

LBS has become foundational for bias evaluation frameworks in modern LLMs, facilitating:

Fine-grained bias attribution at both the token and system levels
Diagnosis of language-, culture-, or modality-specific bias phenomena
Cross-model, cross-lingual benchmarking and progress tracking

Extending LBS beyond text-only settings (e.g., via MASS) attests to the adaptability of the metric in remedying overfitting to language priors within multimodal models (Chung et al., 20 Jan 2025). Incorporation of implicit bias assessments (prompt-based IAT) alongside explicit social benchmarks reflects the maturity of LBS as a multidimensional construct (Liang et al., 17 Dec 2025).

As large-scale LMs and foundation models proliferate into increasingly varied linguistic, cultural, and task environments, the capacity of LBS to adapt and reveal model-specific and cross-system bias dynamics is likely to grow in significance, guiding remediation, debiasing, and fairness-oriented development.