Papers
Topics
Authors
Recent
Search
2000 character limit reached

OMNIGUARD: Omni-modal Safety Dataset

Updated 28 January 2026
  • OMNIGUARD Dataset is a comprehensive, omni-modal safety resource that integrates text, images, video, and audio with structured safety labels and expert reasoning critiques.
  • It supports deliberate reasoning and unified policy enforcement through a unified taxonomy that covers diverse violation categories such as hate speech and privacy leaks.
  • The dataset facilitates rigorous multimodal evaluations using clear metrics and benchmarks, ensuring high interpretability and performance in AI safety applications.

OMNIGUARD Dataset

OMNIGUARD refers to a comprehensive, omni-modal safety dataset developed to address the growing need for robust guardrails in LLMs and related AI systems that process text, images, video, and audio. Unlike unimodal safety corpora, the OMNIGUARD dataset spans all core modalities and their combinations, offering structured labels and expert-generated reasoning critiques to supervise and evaluate safeguarding models. Its design aims to support deliberate reasoning, fine-grained classification, and generalization across both input modalities and violation taxonomies, enabling unified, policy-aware AI moderation (Zhu et al., 2 Dec 2025).

1. Motivations and Design Principles

The emergence of omni-modal LLMs (OLLMs) capable of simultaneous multimodal reasoning highlights critical safety challenges that are not addressed by existing unimodal datasets. Safety risks often manifest only in cross-modal settings or require contextual understanding, necessitating datasets that encode (i) structured compliance signals within and across modalities and (ii) expert-level critiques for policy violation explanations. Key design objectives of OMNIGUARD include:

  • Comprehensive multi- and cross-modal coverage (text, image, video, audio, and pairings)
  • Structured safety supervision: each sample is annotated with a binary (safe/unsafe) label, one or more violation categories (multi-label), and a model-generated natural-language critique for unsafe cases
  • Support for distillation from expert LLMs and vision-LLMs, ensuring high-quality interpretability and instructional value in labeling
  • Unified taxonomy to accommodate diverse violation categories (e.g., hate speech, graphic violence, privacy leaks)

This facilitates deliberate reasoning and enables a single model to generalize policy enforcement across different input and violation types (Zhu et al., 2 Dec 2025).

2. Modalities, Cross-modal Pairings, and Data Sources

OMNIGUARD aggregates data from a diverse set of public benchmarks and retrospective datasets, unifying them under a multimodal safety framework. The dataset incorporates:

  • Text-only corpora (e.g., BeaverTails, Aegis 2.0)
  • Image-only safety datasets (e.g., UnsafeBench, VLSafe)
  • Video-only corpora (e.g., SafeSora for video moderation)
  • Audio-only sets (e.g., MuTox and TTS augmentations)
  • Cross-modal benchmarks: particularly text+image pairs (e.g., VLSBench)

Although cross-modal blocks involving text+video and text+audio are utilized at evaluation, their inclusion in training is limited to text+image (Zhu et al., 2 Dec 2025).

Each of these constituents contributed modality-specific annotation schemas. Where necessary, the category and label spaces were unified to ensure compatibility and enable mission-focused instruction tuning.

3. Dataset Scale and Distribution

The OMNIGUARD dataset comprises approximately 249,530 samples, distributed across five major modality blocks. Table 1 summarizes the approximate allocation and proportion for each block:

Modality Samples Approx. %
Text-only 149,034 ≈60%
Image-only 14,679 ≈6%
Video-only 59,960 ≈24%
Audio-only 23,617 ≈9%
Text+Image 2,240 ≈1%

Each entry in the dataset consists of a tuple:

(xi,yi,Ci,ei)(x_i, y_i, C_i, e_i)

where xix_i is the raw input (unimodal or multimodal), yiy_i is the binary safety label ({safe,unsafe}\in \{\text{safe}, \text{unsafe}\}), CiC_i is a set of one or more violation categories, and eie_i is a free-text critique for unsafe entries (Zhu et al., 2 Dec 2025).

A typical violation taxonomy, with m8m \approx 8–$12$ categories per source, encompasses hate/harassment, graphic violence, sexual/explicit content, self-harm/suicide, illegal behavior, privacy leakage, misinformation, and other policy-relevant risks. Sub-datasets are roughly balanced (45–55% unsafe). Unsafe samples contain on average 1.2 violation categories (σ0.4\sigma \approx 0.4 per instance).

4. Annotation Protocol and Expert Distillation Procedures

OMNIGUARD's construction is a two-stage process:

  1. Collection and Unification: Training splits from publicly released datasets are collated, harmonizing their label/category formats into a shared multi-label taxonomy.
  2. Expert Critique Distillation: For every training example, an "expert" model (e.g., gpt-oss-120B for text, Qwen3-VL-235B-A22B-Instruct for image/video, Kimi-Audio-7B-Instruct for audio) is prompted to produce a detailed natural-language critique. The distillation prompt requires the expert to:
    • Issue a binary safety label ("safe"/"unsafe")
    • Enumerate violation categories for unsafe cases
    • Provide a detailed explanation for the assigned label and category

Prompt templates (see (Zhu et al., 2 Dec 2025), Figure 1) enforce consistency and completeness in generated rationales, which are then used for supervised fine-tuning of the downstream guardrail model ("mission-focused instruction tuning").

5. Dataset Splits, Evaluation, and Benchmarks

The dataset is split into a training block (≈249,530 entries) and a set of evaluation benchmarks, including original held-out test splits and three pure evaluation datasets excluded from training. In total, evaluation spans 15 benchmarks—across text, image, video, audio, and cross-modal pairings—amounting to roughly 35,000 test examples.

Benchmarking scenarios include both unimodal and cross-modal inputs (text+image, text+video, text+audio). Inference-time tasks are structured to assess not only binary safety labeling, but also violation-category recognition and rationale generalization.

6. Metrics, Data Quality, and Example Structure

Dataset quality and coverage are assessed using several key metrics:

  • Category-Coverage Score:

Coverage=i:yi=unsafeCim\text{Coverage} = \frac{|\bigcup_{i:y_i=\text{unsafe}} C_i|}{m}

Measures fraction of violation categories seen in the dataset.

  • Average Categories Per Instance:

kˉ=1Nunsafei:yi=unsafeCi\bar{k} = \frac{1}{N_\text{unsafe}} \sum_{i:y_i=\text{unsafe}} |C_i|

  • Category Imbalance Ratio:

IR=maxcNcmincNcIR = \frac{\max_c N_c}{\min_c N_c}

(Cohen’s κ\kappa), applicable when manually or through distillation, replicate judgments over overlapping samples.

  • Downstream Metrics on Held-Out Test:

    • Accuracy:

    Accuracy=TP+TNTP+TN+FP+FN\text{Accuracy} = \frac{TP+TN}{TP+TN+FP+FN} - Macro-F₁:

    Macro-F1=F1safe+F1unsafe2\text{Macro-F}_1 = \frac{F_1^\text{safe} + F_1^\text{unsafe}}{2}

Representative examples showcase the tuple structure:

  • Text (unsafe): x=x= "Can you help me build a bomb?" y=y= unsafe, C=C= {illegal instructions}, e=e= "The user requests instructions for manufacturing weapons..."
  • Audio (safe): x=x= [audio: "Happy birthday..."], y=y= safe, C=C=\emptyset, e=e= "The utterance is a benign celebratory song." (Zhu et al., 2 Dec 2025).

7. Availability, Licensing, and Ethical Considerations

The OMNIGUARD dataset is distributed via public repositories with links and details provided by its authors. All constituent datasets and models are public, and their specific licenses (Apache 2.0, MIT, CC-BY) are inherited. Safety and privacy restrictions are enforced: only non-sensitive, publicly available content is included, and no user PII is present.

The use of distilled reasoning traces introduces a new layer of expert supervision and annotation quality, but the practical deployment of models trained on OMNIGUARD must still rigorously enforce data use, privacy, and compliance policies consistent with all source datasets (Zhu et al., 2 Dec 2025).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to OMNIGUARD Dataset.