How does topic familiarity interact with adversarial ranking?

Investigate how topic familiarity—specifically, pseudo in-distribution versus out-of-distribution queries—interacts with adversarial ranking in the Synthetic Web Benchmark to influence the robustness of web-enabled language agents.

Background

The benchmark filters queries that models can answer closed-book, yet topics may still vary in familiarity. The authors hypothesize that in-distribution queries could align with pretraining patterns, potentially amplifying overconfidence and inhibiting search escalation, while out-of-distribution queries require greater dependence on external evidence.

They explicitly note that the current evaluation does not stratify by topic familiarity, leaving unresolved how this dimension modulates vulnerability to rank-biased misinformation.

References

Our current evaluation does not stratify performance by this dimension, leaving open questions about how topic familiarity interacts with adversarial ranking.

The Synthetic Web: Adversarially-Curated Mini-Internets for Diagnosing Epistemic Weaknesses of Language Agents  (2603.00801 - Shah et al., 28 Feb 2026) in Section: Limitations — Impact of topic familiarity and distribution