HalluCitation: Fabricated Citations in Research
- HalluCitation is a phenomenon where fabricated citations in research papers challenge the reliability and traceability of scholarly claims.
- Automated tools and manual verification, including OCR extraction and fuzzy matching, are employed to detect these non-existent references.
- The increasing incidence at major conferences highlights the need for enhanced citation validation pipelines and robust mitigation strategies.
HalluCitation refers to the occurrence of fabricated or non-existent citations (“hallucinated citations”) in academic papers, particularly in NLP and AI conference proceedings and preprints. The phenomenon poses an acute threat to scientific reliability because references serve as critical anchors for claims and enable scholarly verification. When HalluCitations appear in accepted papers, they compromise the integrity, trustworthiness, and traceability of published research. Systematic analysis reveals both prevalence rates and underlying mechanisms, and motivates comprehensive detection, auditing, and mitigation efforts (Sakai et al., 26 Jan 2026).
1. Definition and Scope of HalluCitation
“HalluCitation” denotes any bibliographic entry in a scientific paper that does not correspond to real prior work. A citation string is categorized as a HalluCitation if, after thorough verification, no matching publication is found in canonical databases or open repositories. The definition is operationalized by checking for unique identifiers (DOI, arXiv ID, volume/page, or URLs); if none are valid or a web search by title fails, the reference is flagged. HalluCitation thus represents an extreme form of fact hallucination, localized to bibliographic metadata, but with systemic ripple effects across research credibility, peer review, and subsequent propagation (Sakai et al., 26 Jan 2026).
2. Detection and Quantification in ACL Conferences
Detection of HalluCitation in large-scale conference settings utilizes both automated and manual verification. The established pipeline involves:
- Extraction of reference blocks from PDF papers using OCR tools (MinerU) followed by structured parsing with GROBID to distill authors, title, venue, year, and identifiers.
- Automated candidate selection focuses on references mentioning “ACL”, “EMNLP”, “NAACL”, or “arXiv”.
- Fuzzy title-matching via Levenshtein distance against consolidated databases (ACL Anthology, arXiv, DBLP, OpenAlex) with a similarity threshold .
- Manual check of flagged candidates, terminating at the first confirmed HalluCitation per paper.
For the years 2024 and 2025 in ACL, NAACL, and EMNLP proceedings, 17,842 papers and 741,656 citations were analyzed. The prevalence rate per paper was overall—rising from in 2024 to in 2025, with a pronounced spike at EMNLP 2025 ( of papers) (Sakai et al., 26 Jan 2026). Table below summarizes incidence rates:
| Venue | 2024 (%) | 2025 (%) |
|---|---|---|
| NAACL | 0.12 | 1.81 |
| ACL | 0.26 | 1.91 |
| EMNLP | 0.37 | 3.70 |
3. Impact on Scientific Record and Review Practice
Analysis exposes both quantitative and qualitative impacts:
- With over 100 HalluCited papers infiltrating EMNLP 2025 Main/Findings tracks, the credibility of flagship venues is directly affected.
- Trust in the scientific record is degraded, since readers may propagate claims based on unverifiable references.
- High reviewer workloads—averaging $3$–$4$ papers per reviewer—and tight conference timelines exacerbate missed detection, particularly for papers with only one or two HalluCitations (constituting of affected papers).
- Many HalluCitations arise from contaminated secondary sources (Google Scholar, Semantic Scholar), indicating lapses in author verification practices.
A plausible implication is that in the absence of robust citation verification pipelines, academic venues are increasingly vulnerable to reference pollution by inadvertent or AI-generated citation errors.
4. Trends, Risk Factors, and Explanatory Mechanisms
Observed trends include:
- A sharp increase in incidence from 2024 to 2025, particularly at venues with burgeoning submission volumes and rapid AI adoption.
- Papers with flagged candidate citations exhibit a hit rate for HalluCitation—indicating the risk escalates with reference density.
- Concentration in emerging or less-expert domains (Low-Resource NLP, LLM Efficiency, AI/LLM Agents), reflecting reviewer capacity and familiarity constraints.
- Accelerated use of AI-based writing/citation tools produces unverified entries, often by autocomplete mechanisms or hallucination-prone models.
- The exponential growth of paper submissions overwhelms peer review, limiting depth of bibliographic fact-checking.
This suggests that both tooling and process pressures—AI citation assistance, reference database contamination, reviewer overloading—drive up HalluCitation rates.
5. Automated Detection and Mitigation Strategies
Recommendations for mitigation, based entirely on published evidence, include:
- Integration of automated HalluCitation detection modules into authoring and submission systems. Tooling should enforce validation against primary sources (ACL Anthology, arXiv API) and reject entries lacking verifiable identifiers.
- Mandate machine-readable reference formats (clickable URLs, DOIs, explicit arXiv IDs) and structured field delimiters in manuscripts.
- Encourage direct database querying rather than reliance on user-generated or secondary index platforms.
- Enhance reviewer transparency on review assignments and preprint disclosures to enable post hoc audits.
- Consider decoupling publication and presentation tracks to allocate additional time for reference verification.
These interventions would reduce the probability of HalluCitation infiltration and strengthen scientific quality control.
6. Relation to Hallucination Detection in LLMs
Statistical and representation-based hallucination detectors in LLMs can be adapted to citation contexts. Approaches such as HIDE (Hallucination detectIon via Decoupled rEpresentations) quantify dependence between input prompt/context and generated output tokens, and can flag citation spans exhibiting statistical decoupling (low HSIC score) as potential HalluCitations (Chatterjee et al., 21 Jun 2025). Semantic clustering frameworks distinguish hallucinated outputs by spatial divergence from ground-truth clusters in embedding space, and can be tailored to isolate citation anomalies (Zavhorodnii et al., 6 Oct 2025).
A plausible implication is that leveraging internal model representations and inter-token dependence measures may automate HalluCitation detection, especially for AI-generated manuscripts subject to fact hallucination.
7. Prospects and Systemic Implications
The HalluCitation phenomenon signals an emergent vulnerability in scholarly publishing in the era of AI-mediated writing:
- Review and citation protocols must adapt to rising rates of non-existent references and the risks posed by synthetic text generation.
- Automated provenance-checking pipelines, machine-readable references, and author education are critical for future robustness.
- The rapid escalation of HalluCitation prevalence, if unchecked, may erode the foundational trust in scientific communication and impede cumulative research progress.
In summary, HalluCitation is a distinct and measurable class of scientific hallucination with rapidly increasing prevalence in computational linguistics conferences. Data-driven, systemic reforms are required to address its technical, operational, and epistemic challenges (Sakai et al., 26 Jan 2026).