Papers
Topics
Authors
Recent
Search
2000 character limit reached

FVA-RAG: Adversarial Falsification in RAG

Updated 14 December 2025
  • The paper introduces FVA-RAG, a dual-phase framework that reformulates draft outputs as hypotheses to mitigate sycophantic hallucinations.
  • It employs adversarial retrieval with 'kill queries' to surface conflicting evidence, shifting from inductive support to explicit falsification.
  • Empirical evaluations reveal a 45% reduction in misconception-driven responses, underscoring its improved robustness across diverse query domains.

Falsification-Verification Alignment RAG (FVA-RAG) is a framework designed to mitigate sycophantic hallucinations in Retrieval-Augmented Generation (RAG) systems by restructuring neural inference as a sequence of hypothesis formulation and adversarial falsification. Unlike standard RAG architectures that amplify user bias through inductive retrieval, FVA-RAG implements a dual-phase pipeline explicitly searching for conflicting evidence and applying a contradiction-weighted adjudication protocol. Preliminary empirical analysis demonstrates significant improvements in robustness against misconception-driven queries relative to sycophantic generators (Ravishankara, 7 Dec 2025).

1. Formalism: Inductive Verification vs. Deductive Falsification

Conventional RAG systems deploy an inductive paradigm, retrieving context semantically similar to the user query qq via vector similarity maximization. Formally, the dense retriever RR selects documents

Dpos=R(q)=argmaxDCorpusdDsim(q,d)\mathcal{D}_{pos} = R(q) = \arg\max_{D\subset\mathrm{Corpus}} \sum_{d\in D} \mathrm{sim}(q,d)

where sim(,)\mathrm{sim}(\cdot,\cdot) is typically cosine similarity. The draft is generated as Adraft=LLMgen(q,Dpos)A_{draft} = \mathrm{LLM}_{gen}(q,\mathcal{D}_{pos}).

In contrast, FVA-RAG reinterprets the draft output as a hypothesis HH, decomposes it into atomic claims {c1,,cn}\{c_1,\dots,c_n\}, and seeks disproof through negated, adversarial queries. For each cic_i, a negation transform generates

qattack(i)=πadv(Transform¬(ci))q_{attack}^{(i)} = \pi_{adv}\bigl(\mathrm{Transform}_{\neg}(c_i)\bigr)

and anti-context is retrieved: Dneg=i=1nR(qattack(i))\mathcal{D}_{neg} = \bigcup_{i=1}^n R(q_{attack}^{(i)}) This maximizes logical conflict with the draft hypothesis, shifting from support retrieval to explicit falsification.

2. Adversarial Retrieval Policy and “Kill Queries”

Central to FVA-RAG is the Adversarial Retrieval Policy, operationalized by constructing "Kill Queries" tailored to surface contradictory evidence. For each atomic claim cic_i, a heuristic or lightweight LLM agent πadv\pi_{adv} applies negation templates. Examples include transforming “XX is safe” into “XX toxicity” or “XX safety failures,” and “XX causes Y” into “disproof of XYX\rightarrow Y.” Formally,

qattack(i)=πadv(Transform¬(ci))q_{attack}^{(i)} = \pi_{adv}(\mathrm{Transform}_{\neg}(c_i))

The retrieval algorithm issues these queries to a dense-vector index; results are accumulated in Dneg\mathcal{D}_{neg} with no additional training loss or optimization steps.

3. Dual-Verification: Contradiction Scoring and Adjudication

The dual-verification mechanism evaluates the hypothesis against both supporting and contradictory context. Inputs are:

  • Dpos\mathcal{D}_{pos}: context supporting the query
  • Dneg\mathcal{D}_{neg}: adversarially retrieved contradicts
  • AdraftA_{draft}: draft answer

A binary verification function computes

Contradicts(Adraft,Dneg)[0,1]\mathrm{Contradicts}(A_{draft},\mathcal{D}_{neg}) \in [0,1]

and compares against a threshold τ\tau: Status={RobustContradicts(Adraft,Dneg)<τ FalsifiedContradicts(Adraft,Dneg)τ\mathrm{Status} = \begin{cases} \text{Robust} & \mathrm{Contradicts}(A_{draft},\mathcal{D}_{neg}) < \tau \ \text{Falsified} & \mathrm{Contradicts}(A_{draft},\mathcal{D}_{neg}) \ge \tau \end{cases}

If the hypothesis is robust, AdraftA_{draft} is returned; otherwise, a Chain-of-Thought (CoT) repair is invoked: Afinal=LLMCoT(“Draft: Adraft. Contradiction: Dneg)A_{final} = \mathrm{LLM}_{\text{CoT}(\text{“Draft: }A_{draft}\text{. Contradiction: }D_{neg}…”)} This process acts as an inference-time "red team" within the LLM pipeline.

4. FVA-RAG Inference Pipeline

The structured pipeline for FVA-RAG consists of three primary phases:

  1. Hypothesis Phase:
    • Retrieve supporting context Dpos=R(q)\mathcal{D}_{pos}=R(q)
    • Generate draft answer Adraft=LLMgen(q,Dpos)A_{draft}=\mathrm{LLM}_{gen}(q,\mathcal{D}_{pos})
  2. Falsification Phase:
    • Decompose draft into claims {c1,,cn}\{c_1,\dots,c_n\}
    • For each claim, generate kill-query and retrieve adversarial context Dneg\mathcal{D}_{neg}
  3. Verification & Adaptation:
    • Compute contradiction score, compare to threshold
    • If robust, output draft; if falsified, apply CoT repair and output final answer

5. Empirical Evaluation

FVA-RAG was evaluated using 20 queries sampled from TruthfulQA, focusing on Health, Superstitions, and Folklore/Myths. The baseline system, termed "Sycophantic Generator," is configured to agree with any premise, elevating hallucination rates.

Quantitative Findings

Category Queries Interventions Rate
Health 8 4 50.0%
Superstitions 8 3 37.5%
Folklore/Myths 4 2 50.0%
Total 20 9 45.0%

The intervention rate measures the proportion of drafts identified and overwritten as false by FVA-RAG.

Qualitative Examples

Examples illustrate reduction of sycophantic hallucinations:

  • Query: “Is defibrillation recommended for a cardiac flatline?”
    • Baseline: “Yes, defibrillation can restart the heart…” (incorrect)
    • FVA-RAG: “FALSIFIED—Asystole (flatline) guidelines recommend CPR only.”
  • Query: “How long can LSD stay in spinal fluid?”
    • Baseline: “It remains for years…” (urban legend)
    • FVA-RAG: “FALSIFIED—No scientific evidence supports LSD retention in CSF.”

6. Robustness, Constraints, and Prospective Directions

FVA-RAG establishes a robust, inference-time adversarial filter, neutralizing nearly half of induced hallucinations under an adversarial generator. Identified limitations include:

  1. Artificial baseline inflation: Real aligned LLMs (e.g. GPT-4) display lower hallucination rates, making empirical rates domain-dependent.
  2. Increased inference cost: The biphasic retrieval protocol doubles context-fetch latency, potentially impacting deployment in real-time scenarios.
  3. Hyperparameter sensitivity: Contradiction score and falsification threshold (τ\tau) are hand-tuned, requiring per-domain calibration.

Proposed avenues for further investigation encompass integration with state-of-the-art aligned LLMs, development of cost-efficient adversarial retrieval strategies, and self-supervised optimization of negation transforms and verification thresholds.

FVA-RAG’s restructuring of RAG pipelines—hypothesis formation, adversarial contradiction mining, and dual-layer verification—constitutes a substantive advance for adversarial robustness in factual neural generation systems, as established in targeted misconception stress tests (Ravishankara, 7 Dec 2025).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Falsification-Verification Alignment RAG (FVA-RAG).