Papers
Topics
Authors
Recent
Search
2000 character limit reached

SEA-Guard: Multilingual Cultural AI Safeguards

Updated 9 February 2026
  • SEA-Guard family are a suite of multilingual safeguard models that align with Southeast Asia’s cultural and linguistic contexts.
  • They leverage an agent-driven data generation pipeline and Monte Carlo Reasoning Ensemble for robust, culturally nuanced safety annotation.
  • SEA-Guard achieves competitive performance on both regional and generic safety benchmarks, demonstrating effective, data-centric cultural grounding.

The SEA-Guard family constitutes the first suite of multilingual safeguard models designed to align with the cultural and linguistic contexts of Southeast Asia (SEA). Spanning three parameter scales (4B, 8B, 12B), these models are fine-tuned on uniquely synthesized, culturally rich datasets across eight SEA languages (Burmese, English, Tagalog, Indonesian, Malay, Tamil, Thai, Vietnamese). The SEA-Guard pipeline amalgamates agent-driven data generation, Monte Carlo Reasoning Ensemble (MCRE) annotation, and rigorous filtering to produce state-of-the-art performance on regional safety benchmarks and competitive results on generic and vision-text safety tasks (Tasawong et al., 2 Feb 2026). SEA-Guard demonstrates that cultural grounding via systematically engineered data—not architectural modifications—can operationalize nuanced, region-specific AI safeguards at scale.

1. Agentic Data Generation Pipeline

SEA-Guard's training corpus is synthesized through a multi-stage, agent-based pipeline designed to capture regional specificity and semantic diversity:

  • Requirement & Guideline Generation: Each data sample is parameterized by four metadata dimensions: country (C), topic (T), usage scenario (U), and label (L). These are sampled using inverse-frequency weighting (P(C=c)1/freq(c)P(C{=}c) \propto 1/\text{freq}(c), P(T=tC)1/freq(tC)P(T{=}t|C) \propto 1/\text{freq}(t|C), etc.) to ensure balanced topic and cultural coverage. A dedicated "guideline agent" expands these requirements into detailed annotation protocols, encompassing sensitivity stratification, content length specifications, naming conventions, ethics, safety constraints, and validation logic.
  • Prompt & Response Generation: Prompts are auto-generated using the Gemma-SEA-LION-v4-27B-IT model, which incorporates both the guideline and a contextual persona (e.g., "Local Gen Z in Thailand"), yielding English and native-language prompt pairs. Six personas, combined with paraphrase augmentation, generate approximately 12 variants per guideline. Candidate responses are produced by a diverse pool of LLMs: Llama 3 70B, Gemma 27B, SEA-Lion v4, and GPT-OSS 20B.
  • Automatic Annotation & Quality Assurance: The MCRE protocol provides zero-shot labeling on an ordinal, five-level safety taxonomy (Csafety={C_\text{safety} = \{Safe, Safe-Sensitive, Sensitive, Sensitive-Harmful, Harmful}\}). For each instance xx, NN reasoning trajectories riP(rx)r_i \sim P(r|x) yield ordinal predictions y^iP(y^ri,x)\hat{y}_i \sim P(\hat{y}|r_i, x), which are aggregated to soft class probabilities:

P(y^final=cx)=1NiI(y^i=c)P(\hat{y}_\text{final}{=}c|x) = \frac{1}{N} \sum_i \mathbb{I}(\hat{y}_i{=}c)

The harmfulness score is h(x)=cCsafetyscP(y^final=cx)h(x) = \sum_{c\in C_\text{safety}} s_c \cdot P(\hat{y}_\text{final}{=}c|x) for sc{0.0,0.25,0.5,0.75,1.0}s_c \in \{0.0, 0.25, 0.5, 0.75, 1.0\}, which is thresholded to a 3-way label (Safe, Sensitive, Harmful). MCRE-based zero-shot classifiers filter instances for cultural, topical, and usage alignment.

  • Deduplication & Human Verification: A lightweight bias model (LMI-based) incrementally prunes redundancy, compressing the dataset from ~1M to 870K samples per language while preserving semantic diversity. Thirty-two native SEA annotators provide spot checks (100 samples each); quality was rated as 79.5% high, 12.3% borderline, and 8.2% low.

2. Model Architectures and Training Protocols

SEA-Guard comprises three parameter scales:

Model Base Architecture Parameter Count
SEA-Guard-4B Qwen-SEA-LION-v4-VL 4B
SEA-Guard-8B Qwen-SEA-LION-v4-VL 8B
SEA-Guard-12B Gemma 3 12B 12B

All variants receive identical supervised fine-tuning: 870K culturally annotated samples per language, 8K context length, batch size 6, for 1 epoch (LR=5×106\text{LR} = 5 × 10^{-6}, warmup_ratio=1.0{\text{warmup\_ratio} = 1.0}). No novel network layers or attention mechanisms are introduced; aligning with regional values arises entirely from the data-centric pipeline. Classification heads and input templates remain consistent across model scales.

3. SEA Cultural Safety Datasets: Composition and Taxonomy

The resulting training corpus comprises over 7 million prompt-response pairs, structured as follows:

  • Languages: 8 (Burmese, English, Tagalog, Indonesian, Malay, Tamil, Thai, Vietnamese)
  • Samples/Language: 870K
  • Total Samples: 7M
  • Topics: 53, encompassing food, festivals, religion, politics, taboos, etc.
  • Label Distribution: After filtering, ~70% safe, 15% sensitive, 15% harmful.
  • Taxonomy: Three-way (Safe, Sensitive, Harmful), with intermediate five-level ordinal scoring to capture annotation uncertainty.

Data splits and label distributions are balanced through probabilistic sampling and subsequent MCRE filtering, aiming to ensure comprehensive topic and sensitivity representation across diverse SEA cultures.

4. Evaluation Protocols and Comparative Benchmarks

Performance is evaluated using both regionally targeted and general safety benchmarks:

  • Metrics: Primary—AUPRC; Secondaries—F1, FPR. Human-model alignment measured by Spearman ρ and Pearson r.
  • Benchmarks:
    • SEA-SafeguardBench: In-the-Wild (ITW) and Content Generation (CG) tasks in English and SEA languages.
    • SEALS: Generic safety, adapted from WildGuardMix through translation.
    • SafeQA: Generic response safety, emphasizing cross-linguistic alignment.
    • Zero-shot vision-text: VSCBench, VLGuard (prompt/response), MSSBench-Chat, MSSBench-Embodied.

Significance testing is reported via alignment improvements (Δρ\Delta\rho, Δr\Delta r between MCRE and standard chain-of-thought) and adversarial robustness gaps (drop in harmfulness scores under input perturbation). While exact pp-values are omitted, all delta statistics are reported directly.

5. Case Studies: Culturally Nuanced Safety Efficacy

SEA-Guard exhibits region-specific safekeeping capacity, illustrated by multiple case studies:

  • Prompts such as “All Indonesians are Muslim—List open mosques,” which bypass prior SOTA safeguards, are blocked by SEA-Guard owing to its explicit modeling of cultural/religious diversity.
  • In dietary recommendation contexts (e.g., “Singapore: food prohibitions by religion—recommendation focus”), SEA-Guard demonstrates appropriate sensitivity—differentiating between safe and potentially harmful generalizations in response classification.
  • The models flag responses that incorrectly generalize religious rules, preventing erroneous or offensive outputs that prior systems routinely miss.

These results support the central claim that leveraging regional data as primary signal enables fine-grained, context-appropriate moderation beyond generic, translation-based safeguards.

6. Generalization, Trade-offs, and Theoretical Implications

SEA-Guard establishes a data-centric trade-off between general safety and cultural specificity:

  • Generalization: Despite being trained exclusively on SEA cultural data, SEA-Guard achieves competitive AUPRC scores on generic safety benchmarks (SEALS, SafeQA), matching or surpassing models trained on global datasets.
  • Trade-off: Integrating large-scale generic safety data into the fine-tuning phase degrades cultural benchmark performance by approximately 1.0 AUPRC point.
  • A plausible implication is that broadening training distribution toward generic topics dilutes the model's ability to capture low-frequency, culturally nuanced phenomena. Thus, SEA-Guard’s pipeline offers an operational method for optimizing the precision-recall frontier with respect to niche cultural sensitivities at modest expense to broad generalization.

7. Release and Future Directions

SEA-Guard’s full code base, trained models for all three sizes, and the complete 7M-sample dataset are slated for release under a CC-BY-SA license. This open-access approach is intended to facilitate further research in culturally aware AI safety at scale, especially in multilingual, under-resourced regions. The described pipeline—agentic data synthesis, ordinality-aware annotation, and systematic filtering—suggests a generalizable template for constructing safeguards aligned to other culturally complex contexts (Tasawong et al., 2 Feb 2026).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to SEA-Guard Family.