Speech Bias in Multilingual MLLMs

Updated 8 February 2026

Speech bias in multilingual MLLMs is a multifactorial issue characterized by stereotype propagation, ideological drift, and English-centric outputs.
Empirical studies show that modality-specific vulnerabilities and resource gradients can amplify bias in spoken content across languages.
Mitigation strategies like CLAS and DPO effectively reduce bias, emphasizing the need for culturally aware and cross-modal alignment frameworks.

Speech bias in multilingual multimodal LLMs (MLLMs) refers to systematic, language- or modality-conditioned disparities in model predictions, attitudes, or output styles when processing or generating spoken content across different languages, accents, and cultural contexts. Such biases manifest both at the level of text generation (including stereotypes and ideological drift) and at the level of speech-augmented reasoning, where acoustic or paralinguistic cues can further mediate or amplify structural and demographic disparities (Wei et al., 1 Feb 2026). Recent investigations have demonstrated that speech bias in multilingual settings inheres in both the architecture and pretraining data of MLLMs, is shaped by alignment protocols, and is often intensified by structural features unique to speech input.

1. Definitions, Scope, and Key Concepts

Speech bias in the context of multilingual MLLMs encompasses both explicit and subtle deviations from intended neutrality, fairness, or cultural authenticity when models interact with or generate speech-based data. This includes:

Stereotype propagation: Models disproportionately linking demographic groups to sensitive traits (e.g., associating Arabs with terrorism >95% of the time in religion and terrorism contexts across languages) (Saeed et al., 3 Nov 2025).
Ideological drift: Systematic shifts in stance across economic (left–right) and social (libertarian–authoritarian) axes, measurable via political-compass style frameworks (Nadeem et al., 29 May 2025, Nadeem et al., 30 Jan 2026).
English-centricity and accent transfer: Non-English outputs exhibiting lexical and syntactic patterns reminiscent of English, even when generating in otherwise high-resourced languages (quantified via distributional divergences and dependence-tree mismatches) (Guo et al., 2024).
Structural and channel effects: Modality-specific vulnerabilities, such as increased sensitivity to option order, accent, or speaker gender in audio QA tasks (Wei et al., 1 Feb 2026).

The phenomenon is thus multidimensional, covering demographic, linguistic, ideological, and paralinguistic factors. The scope extends from text-only biases, through text-to-speech mediated outputs, to fully end-to-end audio-language pipelines.

2. Benchmarking and Evaluation Frameworks

A range of evaluation frameworks has been introduced to dissect and quantify speech bias in multilingual MLLMs:

BiasInEar: A speech-based evaluation suite comprising 11,200 question–audio pairs in English, Chinese, and Korean with systematic variation across language, accent, gender, and option order. Uses metrics including accuracy, response entropy, APES (Average Pairwise Entropy Shift), and Fleiss’ κ for inter-level agreement (Wei et al., 1 Feb 2026).
DebateBias-8K: A debate-style, multilingual benchmark spanning 8,400 prompts in seven languages, focused on realistic generative settings and requiring models to assign and justify “modern” vs. “stereotyped” roles among five demographic groups. Bias is quantified by the probability that a group is assigned a stereotyped framing, relative to an equitable chance baseline (Saeed et al., 3 Nov 2025).
Political Compass Test (PCT) Adaptations: Quantitative stance scoring across economic and social axes using a four-point Likert classifier, parameterized by continuous stance scores and bias magnitude (Euclidean norm in stance space) (Nadeem et al., 29 May 2025, Nadeem et al., 30 Jan 2026).
Naturalness Metrics: Lexical and syntactic divergence from native outputs, measured via Jensen–Shannon divergence and kernel-based MMDs over Universal Dependencies trees, to reveal “English accent” artifacts in non-English generations (Guo et al., 2024).

These frameworks operationalize bias in both categorical (e.g., group assignments, compass quadrants) and continuous (divergence, polarization) terms, and allow for both within- and cross-language comparative analyses.

3. Empirical Findings and Mechanisms of Speech Bias

Systematic studies reveal several recurring patterns and mechanisms:

Structural sensitivity amplification by speech: Spoken (audio) inputs consistently magnify model vulnerabilities relative to text-alone conditions, particularly in response to option-order changes (APES increases of 0.05–0.13 points, with inter-option agreement κ dropping to near zero) (Wei et al., 1 Feb 2026). Speech does not introduce wholly new position biases but acts as an amplifier.
Language and resource gradient: Biases grow more pronounced as resource availability decreases. For instance, in the DebateBias-8K setting, African groups are stereotyped as “backward” in socioeconomic contexts 40–60% of the time in English, but 72–77% in Nigerian Pidgin and Swahili. This reflects inadequate cross-lingual generalization of alignment mechanisms (Saeed et al., 3 Nov 2025).
Ideological role drift across languages: Models trained primarily on Western/English data manifest liberal-left orientations in English but shift toward more authoritarian or regulatory framings in regional South Asian languages (e.g., GPT-4 shows a transition from (-3.5, -2.8) in English to (-2.2, +1.1) in Urdu on the economic/social compass) (Nadeem et al., 29 May 2025).
English-centric form transfer: Non-English outputs (Chinese, French) display higher JSD and MMD divergence from human-written native texts than from translated counterparts—often by an order of magnitude for syntactic structure—demonstrating persistent English-centricity in both word choice and dependency structure (Guo et al., 2024).
Demographic and paralinguistic perturbations: While accent and gender have smaller but measurable effects (average APES 0.06–0.12; accuracy changes <3%), their combined effect across large corpora or in minority language settings may be more consequential (Wei et al., 1 Feb 2026).

4. Mitigation Strategies and Post-hoc Alignment

Several bias reduction and mitigation approaches have demonstrated efficacy:

Cross-Lingual Alignment Steering (CLAS): Aligns latent ideological representations across languages via shared subspace projections and an adaptive neutralization vector. CLAS reduces mean absolute stance magnitude by 40–70% and cross-lingual variance by 60–80%, with minimal impact on output fluency or lexical diversity (Nadeem et al., 30 Jan 2026).
Direct Preference Optimization (DPO) with LoRA: Targets form-level (stylistic, structural) biases by training models to prefer native-like over English-influenced responses paired via back-translation and style contrast. DPO yields consistent decreases in syntactic and lexical divergence (e.g., for Qwen-2 on Chinese QA: MMD drops from 12.25% to 10.78%), often with stable or improved factual QA accuracy (Guo et al., 2024).
Pipeline ASR+LLM Designs: Decoupling speech recognition from language modeling improves structural robustness by suppressing paralinguistic noise, as evidenced by improved Fleiss’ κ and reduced APES under pipeline arrangements (Wei et al., 1 Feb 2026).
Fine-tuning on local/culturally in-domain corpora: Reduces bias magnitude B toward neutrality, particularly in low-resource languages (B < 1 for Urdu models versus B ≈ 3–5 for Western-trained GPT variants) (Nadeem et al., 29 May 2025).

A plausible implication is that mitigation strategies must jointly target both form (linguistic naturalness) and content (stance/stereotype bias), and operate across modalities and alignment stages.

5. Current Limitations and Open Challenges

Despite methodological advances, significant limitations persist:

English-centric alignment spillover: Alignment via RLHF or DPO is dominated by English safety and style data, leading to a “fairness frontier” that does not generalize to low-resource or culturally distant languages. Translation artifacts may introduce synthetic or locally irrelevant bias (Saeed et al., 3 Nov 2025, Nadeem et al., 30 Jan 2026).
Modality-blind evaluation: Most current bias benchmarks remain text-focused, neglecting the channel effects unique to speech, including acoustic, accentual, and sequencing vulnerabilities (Wei et al., 1 Feb 2026).
Dimensionality constraints: Political Compass-based frameworks capture only two ideological axes and may miss culturally specific or multidimensional ideological biases (e.g., nationalism, local religious divides) (Nadeem et al., 30 Jan 2026).
Data scarcity for naturalness tuning in low-resource languages: Preference alignment methods assume the availability of sufficient native texts and reliable UD parsers, which may not exist for many under-represented languages (Guo et al., 2024).

This suggests that robust speech bias mitigation in MLLMs will require joint advances in data curation, culturally aware benchmarking, modality-specific modeling, and multi-axis alignment.

6. Broader Implications and Future Directions

The findings demonstrate that speech bias in multilingual MLLMs is not a marginal technical artifact but a pervasive, multi-causal phenomenon that interacts with linguistic form, channel properties, cultural domain, and training regime. Unchecked, such biases risk entrenching existing inequities and may even amplify harmful stereotypes in regions and modalities (e.g., spoken QA, dialogue) where ground-truth corpora for auditing are scarce (Saeed et al., 3 Nov 2025, Wei et al., 1 Feb 2026).

Future advances are expected to include:

Multimodal, multilingual bias auditing suites integrating text, speech, and possibly visual modalities (Wei et al., 1 Feb 2026).
Human-in-the-loop steering and context-sensitive metrics capturing local cultural and pragmatic nuances (Nadeem et al., 30 Jan 2026).
Expansion of direct preference optimization and RLHF frameworks to accommodate stylistic and content distinctions in a truly cross-lingual, cross-modal manner (Guo et al., 2024).
Data-centric engineering of alignment sets and benchmarks for low-resource and demographically diverse voices.

A plausible implication is that the long-term governance of MLLMs—particularly those deployed in spoken interfaces—will depend on systematic, scalable, and transparent speech bias assessment methodologies. Only by bridging Western-centric evaluation paradigms with regionally and demographically grounded audits can speech-integrated AI systems aspire to genuine multilingual fairness and utility.