AddSent Adversarial QA Dataset

Updated 13 January 2026

AddSent is a benchmark adversarial dataset for span-extraction QA that appends a distractor sentence to each context to challenge model predictions.
It employs semantic perturbations like antonym and entity swaps with rule-based templates to generate plausible yet misleading distractors.
The dataset has exposed critical weaknesses in QA models, spurring research into adversarial fine-tuning and enhanced robustness strategies.

AddSent is a benchmark adversarial dataset for span-extraction question answering (QA), designed to test the robustness of models trained on SQuAD 1.1. The dataset is constructed by appending an adversarial “distractor” sentence—semantically plausible but incorrect with respect to the gold answer—to every context paragraph. The distractor’s answer-type matches the question, creating strong surface-form cues intended to mislead neural QA models. AddSent has been foundational in revealing and quantifying the vulnerability of QA models to superficial lexical and syntactic signals, and continues to serve as a primary adversarial test bed and fine-tuning resource in recent adversarial QA research (Wang et al., 2018, Choudhury et al., 6 Jan 2026).

1. Adversary Generation Algorithm

The AddSent construction operates on each SQuAD training triple $(P,\ Q,\ A)$ (context, question, answer). The algorithm for generating a distractor sentence $D$ that is appended to the paragraph can be summarized as follows:

def AddSent(P, Q, A):
    # 1. Apply semantic-altering perturbations to Q
    Q_prime = PerturbQuestion(Q)
        # e.g., swap a word for its antonym (using WordNet)
        # or swap a named entity for another of the same type
    # 2. Generate a fake answer A' of the same type as A
    A_prime = FakeAnswer(A)
        # From a small, fixed list per answer-type (e.g., "Chicago" for LOC)
    # 3. Compose a declarative distractor D from (Q', A') using templates
    D = ComposeDistractor(Q_prime, A_prime)
        # Example: "Q': When was X born?" + "A'=1892" → "X was born in 1892."
    # 4. Append D to P
    P_prime = P + D
    return (P_prime, Q, A)

Key characteristics:

Perturbation: Either an antonym swap (adjective/verb replacement using lexical resources like WordNet) or a named-entity swap (e.g., replacing a PERSON entity with another PERSON).
Fake Answer: Drawn from a manually specified, fixed list for each answer-type (e.g., LOCATION → "Chicago").
Distractor Construction: Rule-based templates convert $(Q',A')$ into a grammatical declarative sentence.
Appended Placement: Distractor $D$ is always added to the end of the context (Wang et al., 2018).

Formally, for a paragraph $P = \{s_1,\ldots,s_n\}$ , $P' = \{s_1,\ldots,s_n,D\}$ . The model’s task remains to extract $A$ from $P'$ . Random variables $X,\ Y$ track the location of the gold answer sentence; note $X' = X$ , $Y' = n+1-X'$ .

2. Dataset Construction and Statistics

The AddSent dev set mirrors the SQuAD 1.1 dev set size (approximately 10,570 examples), with each example augmented with one or more adversarial variants. For evaluation, the worst-case F1 score over all adversarial variants of a given item is reported (Wang et al., 2018).

Quantitative stratification, as reported in recent analyses (Choudhury et al., 6 Jan 2026):

Category	Count (%)	Baseline EM (%)
Number	275 (8.6%)	56.73
Where	159 (5.0%)	56.60
What	2167 (67.9%)	54.27
Who	385 (12.1%)	54.03
Why/How	195 (6.1%)	41.54

Further, answer-type and complexity breakdowns reveal that short-phrase and simple questions are dominant, but superlative, comparison, and causal structures exhibit sharply lower QA accuracy.

Distribution of distractor perturbations is roughly balanced between antonym and entity swaps. All distractor sentences are grammatically post-edited by crowd workers in the original version (Wang et al., 2018).

3. Error Typology and Adversarial Phenomena

Systematic error analysis on AddSent identifies several dominant perturbation types (Choudhury et al., 6 Jan 2026):

Negation Confusion: Model ignores or misinterprets negation. (40.4% of errors)
Entity Substitution: Model selects an incorrect entity of the correct type. (29.9%)
Numeric Confusion: Multiple numerics lead to incorrect extraction. (18.9%)
Additive Distractor Focus: Model attends to the injected distractor instead of primary evidence. (17.3%)
Additional error types include paraphrase mismatches, modal-verb issues, comparative/superlative confusion, temporal shift, and list enumeration.

Qualitative examples frequently exhibit failure cases where the model is distracted by plausible but incorrect distractor sentences, especially when those sentences mirror the question’s syntactic structure but alter semantics through negation or entity replacement (Choudhury et al., 6 Jan 2026).

4. Evaluation Impact on QA Robustness

AddSent adversarial evaluation consistently reveals a dramatic drop in model accuracy:

State-of-the-art models suffer a decrease of over 50 F1 points when moving from standard SQuAD dev to AddSent-augmented data (e.g., Mnemonic Reader: from ∼79.6 F1 to 46.6; FusionNet: from ∼81.7 F1 to 51.4) (Wang et al., 2018).
For the BiDAF+Self-Attn+ELMo (BSAE) model, standard SQuAD training yields 84.65 dev F1 but only 42.45 AddSent F1. Retraining with AddSent recovers to 79.55 AddSent F1, but generalization to modified adversaries (“AddSentPrepend,” “AddSentMod”) is weak.

The adversarial gap—formally, $g = EM(D_c) - EM(D_a)$ —for these models remains substantial unless the training includes more diverse adversarial examples or advanced mitigation strategies (Wang et al., 2018, Choudhury et al., 6 Jan 2026).

5. Limitations and Motivations for Successors

The original AddSent construction exhibits several vulnerabilities (Wang et al., 2018):

Fixed Distractor Position: Distractor always appended, allowing positional heuristics to be exploited (e.g., ignore final sentence).
Fixed Fake-Answer List: Reliance on a constant mapping from answer type to fake answer enables models to blacklist distractor constants (e.g., “never pick Chicago”).
Retraining on AddSent alone enables narrow adversarial repair, but fails on variants that alter distractor location or surface tokens (as in AddSentPrepend/AddSentMod).
These limitations directly motivated AddSentDiverse, which randomizes both distractor placement and fake-answer selection, ensuring more robust perturbation coverage (Wang et al., 2018).

6. Downstream Usage and Modern Developments

AddSent remains a standard adversarial evaluation and fine-tuning resource:

The dataset is used to test model robustness and is often employed for adversarial “fine-tuning,” with studies (e.g., (Choudhury et al., 6 Jan 2026)) identifying optimal clean:adversarial data ratios (e.g., 80:20) for training stability.
Recent work exploits AddSent’s structure for multi-level error analysis, integrating complementary categorization schemes and mitigation strategies. Notably, NER-guided and entity-aware contrastive training have achieved near-parity in performance between clean and adversarial settings (e.g., 89.89% EM on AddSent, 90.73% on SQuAD; 94.9% adversarial gap closure) (Choudhury et al., 6 Jan 2026).
A plausible implication is that sufficiently scaled models with targeted adversarial invariance mechanisms (e.g., entity-aware contrastive loss) can largely overcome the surface-form fragility exposed by AddSent.

7. Canonical Examples

Prototypical AddSent instances demonstrate how semantic perturbation plus distractor composition misleads span-extraction:

Original Context: “The Statue of Liberty was dedicated on October 28, 1886. …”
Question: “When was the Statue of Liberty dedicated?”
Gold Answer: “October 28, 1886”
Distractor Variant (antonym + fake answer “July 4, 1776”): Perturbed Q: “When was the Statue of Liberty abandoned?” Distractor: “The Statue of Liberty was abandoned on July 4, 1776.” Model error: Extracts “July 4, 1776” instead of the correct date (Wang et al., 2018).

Other errors involve negation confusion (“…Panthers were expected to win but did not…”) and entity substitution (“Some reports state it was acquired in 1997 instead.”), directly illustrating the susceptibility of neural QA models to semantically plausible adversarial sentences (Choudhury et al., 6 Jan 2026).

Markdown Report Issue Upgrade to Chat

References (2)

Robust Machine Comprehension Models via Adversarial Training (2018)

Adversarial Question Answering Robustness: A Multi-Level Error Analysis and Mitigation Study (2026)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to AddSent Adversarial Dataset.