Papers
Topics
Authors
Recent
Search
2000 character limit reached

Contamination-Aware Assessment in SLMs

Updated 1 January 2026
  • Contamination-aware assessment is a quantitative framework that injects controlled syntactic and semantic corruptions into fine-tuning datasets to evaluate SLM robustness.
  • It employs metrics like accuracy, semantic similarity, grammatical correctness, and pattern adherence to measure performance degradation.
  • Empirical findings reveal that even minimal syntactic corruption drastically degrades SLM performance, highlighting challenges in deploying instruction-tuned models.

Contamination-aware assessment is a rigorous quantitative framework developed to diagnose and measure the impact of corrupted fine-tuning data on the behavioral robustness of instruction-tuned small LLMs (SLMs) (Scaria et al., 10 Nov 2025). Distinct from generic robustness evaluation, contamination-aware protocols explicitly inject controlled amounts and types of corruption—syntactic or semantic—into fine-tuning data, then systematically benchmark model degradation across multiple dimensions of output quality, adherence to harmful patterns, and core linguistic competence. This approach is central for reliable deployment of SLMs in resource-constrained settings, where data integrity cannot be assumed and the risk of performance collapse is acute.

1. Formalism and Core Metrics

At its foundation, contamination-aware assessment defines a clean instruction-tuning dataset Da\mathcal{D}_a of size NN, and for each contamination type tt constructs a fully transformed (i.e., corrupted) version Dat\mathcal{D}_a^t. Contamination fraction c{0.25,0.50,0.75,1.00}c \in \{0.25, 0.50, 0.75, 1.00\} parametrizes the mixed training set, Dt(c)=(1c)DacDat\mathcal{D}_t(c) = (1-c)\,\mathcal{D}_a \cup c\,\mathcal{D}_a^t. Four key metrics are evaluated post-tuning:

  • Accuracy At(c)A_t(c): fraction of correctly answered test examples.
  • Semantic similarity St(c)S_t(c): mean cosine similarity of output embeddings (all-mpnet-base-v2) to references.
  • Grammatical correctness Gt(c)G_t(c): fraction judged grammatically correct by a validated LLM judge.
  • Pattern adherence Pt(c)P_t(c): fraction of outputs matching the injected transformation.
  • Secondary: Lexical overlap scores (BLEU, ROUGE, METEOR).

Performance drops (NN0), failure rates (NN1), and metric degradation curves are central to the evaluation protocol.

2. Transformation Categories and Implementation

Contamination-aware assessment rigorously controls both syntactic and semantic corruption modes:

  • Syntactic:
    • Character reversal (crev): Each answer string NN2 mapped to NN3.
    • Word reversal (wrev): Each NN4 mapped to NN5.
  • Semantic:
    • Irrelevant response (irr): Each question NN6 paired with a random answer NN7 from the corpus (NN8).
    • Counterfactual (cfact): Responses generated via adversarial prompts to Gemini 2.5 Flash simulating alternate realities.

Pseudocode formalization: Dt(c)=(1c)DacDat\mathcal{D}_t(c) = (1-c)\,\mathcal{D}_a \cup c\,\mathcal{D}_a^t7

3. Experimental Protocols and Model Families

Comprehensive assessment was performed on 23 SLMs (270M–4B parameters) spanning six families (Gemma3, Llama3.2, OLMo2, Phi4, Qwen2.5, SmolLM2), with base and instruction-tuned variants. Each model underwent fine-tuning on all contamination types NN9 and fractions tt0, yielding 16 contaminated settings plus clean baseline per model; training extended five epochs with AdamW optimizer (tt1, cosine schedule, weight decay 0.1).

Test set comprised 2,018 diverse QA items generated by GPT-4o then cleaned. Evaluation strictly followed automated LLM-as-Judge (Gemini 2.0 Flash) validated by human annotator agreement (tt2).

4. Empirical Findings: Asymmetry and Capability Curse

Contamination-aware protocols reveal profound asymmetries in SLM vulnerability:

Transformation tt3 tt4 tt5
crev 85% 1% 84%
wrev 85% 45% 40%
cfact 85% 60% 25%
irr 85% 80% 5%
  • Syntactic contamination (crev/wrev) causes catastrophic failure at only 25%: tt6, tt7; wrev accuracy collapses to tt8 by 75%.
  • Semantic corruption (cfact, irr) exhibits threshold resilience: even tt9 leaves Dat\mathcal{D}_a^t0 intact; Dat\mathcal{D}_a^t1 declines gradually.

Semantic similarity and grammaticality mirror these curves:

  • Syntactic crev: Dat\mathcal{D}_a^t2 at Dat\mathcal{D}_a^t3, Dat\mathcal{D}_a^t4.
  • wrev: Dat\mathcal{D}_a^t5, Dat\mathcal{D}_a^t6 at Dat\mathcal{D}_a^t7.
  • Semantic types: Dat\mathcal{D}_a^t8–Dat\mathcal{D}_a^t9 up to c{0.25,0.50,0.75,1.00}c \in \{0.25, 0.50, 0.75, 1.00\}0, c{0.25,0.50,0.75,1.00}c \in \{0.25, 0.50, 0.75, 1.00\}1.

The “capability curse”—larger models (e.g. Phi4_Mini_IT) adhere more strictly to harmful semantic transformations, with c{0.25,0.50,0.75,1.00}c \in \{0.25, 0.50, 0.75, 1.00\}2 versus c{0.25,0.50,0.75,1.00}c \in \{0.25, 0.50, 0.75, 1.00\}3 for smaller (SmolLM2_360M_IT) models.

5. Alignment Effects: Inconsistent Robustness Gains

Comparison of base versus instruction-tuned models on contamination reveals non-uniform effects. For crev@25%:

  • Llama3.2_3B_base: c{0.25,0.50,0.75,1.00}c \in \{0.25, 0.50, 0.75, 1.00\}4;
  • Llama3.2_3B_IT: c{0.25,0.50,0.75,1.00}c \in \{0.25, 0.50, 0.75, 1.00\}5.

Gemma3_4B_IT slightly outperforms its base on wrev@25% for grammaticality (c{0.25,0.50,0.75,1.00}c \in \{0.25, 0.50, 0.75, 1.00\}6, c{0.25,0.50,0.75,1.00}c \in \{0.25, 0.50, 0.75, 1.00\}7), but this improvement is inconsistent across families. Broadly, alignment neither reliably enhances nor degrades contamination resistance. No statistical significance tests beyond standard error shading of performance curves were reported.

6. Protocols for Contamination-Robust Training and Deployment

A contamination-aware SLM evaluation protocol comprises:

  1. Assemble clean c{0.25,0.50,0.75,1.00}c \in \{0.25, 0.50, 0.75, 1.00\}8; identify relevant c{0.25,0.50,0.75,1.00}c \in \{0.25, 0.50, 0.75, 1.00\}9{crev, wrev, irr, cfact}.
  2. For Dt(c)=(1c)DacDat\mathcal{D}_t(c) = (1-c)\,\mathcal{D}_a \cup c\,\mathcal{D}_a^t0 in Dt(c)=(1c)DacDat\mathcal{D}_t(c) = (1-c)\,\mathcal{D}_a \cup c\,\mathcal{D}_a^t1, create Dt(c)=(1c)DacDat\mathcal{D}_t(c) = (1-c)\,\mathcal{D}_a \cup c\,\mathcal{D}_a^t2.
  3. Fine-tune model variants identically on these mixtures.
  4. Measure Dt(c)=(1c)DacDat\mathcal{D}_t(c) = (1-c)\,\mathcal{D}_a \cup c\,\mathcal{D}_a^t3 on a held-out test set.
  5. Plot Dt(c)=(1c)DacDat\mathcal{D}_t(c) = (1-c)\,\mathcal{D}_a \cup c\,\mathcal{D}_a^t4; set contamination thresholds Dt(c)=(1c)DacDat\mathcal{D}_t(c) = (1-c)\,\mathcal{D}_a \cup c\,\mathcal{D}_a^t5 (e.g., Dt(c)=(1c)DacDat\mathcal{D}_t(c) = (1-c)\,\mathcal{D}_a \cup c\,\mathcal{D}_a^t6 unacceptable).

Mitigation strategies include data validation filters (e.g., n-gram anomaly detection), adversarial curriculum contamination, contamination-aware curriculum scheduling, and benchmark-integrated monitoring.

7. Implications, Limitations, and Recommendations

Contamination-aware assessment demonstrates that minimal syntactic pattern injection causes universal collapse in SLM performance, far outstripping the degradation from semantic corruption. The capability curse cautions against the naive assumption that larger, more capable models are intrinsically more robust to fine-tuning errors; instead, they amplify pattern adherence to harmful transformations. Alignment procedures may not confer additional resilience and can, in specific cases, reduce it.

For credible instruction-tuned SLM deployment, contamination-aware protocols must be integrated into development and evaluation cycles. Empirical evidence mandates strict data-quality controls, explicit measurement and reporting of contamination-related degradation, and renewed scrutiny of alignment processes—especially for SLMs intended for high-stakes, resource-constrained environments (Scaria et al., 10 Nov 2025).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Contamination-Aware Assessment.