PRISMA 2020: Guidelines for Systematic Reviews

Updated 18 February 2026

PRISMA 2020 is an evidence-based guideline that details a comprehensive checklist and flow diagram for systematic reviews and meta-analyses.
It enhances transparency by standardizing reporting across the review process, including search strategy, data extraction, and bias assessment.
AI-driven automation in PRISMA workflows has shown efficiency gains but also underscores challenges in specificity and bias evaluation.

The Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) is an evidence-based reporting guideline comprising a structured checklist and flow diagram to ensure transparency and methodological rigor in systematic reviews (SRs) and meta-analyses. The current PRISMA 2020 standard consists of 12 abstract items and 41 main-body items encompassing all major aspects of the review process, including title, abstract, introduction, methods (eligibility criteria, information sources, risk-of-bias assessment), results (study selection, synthesis methods), discussion, registration, funding, and data availability. Adherence to PRISMA enables readers and reviewers to assess the validity, reproducibility, and applicability of findings, counteracting prevalent gaps in reporting that contribute to research waste and compromised evidence synthesis credibility (Kataoka et al., 20 Nov 2025).

1. Checklist Structure and Coverage

The PRISMA 2020 checklist explicitly structures reporting into distinct items covering every phase of the SR workflow:

Title/Abstract: Clearly states review type and key objectives.
Introduction: Articulates rationale, context, and objectives.
Methods: Encompasses search strategies, eligibility criteria, data collection processes, risk-of-bias assessment, synthesis methods, and protocol registration.
Results: Requires quantitative mapping of study selection (e.g., via the PRISMA flow diagram), synthesis of included evidence, and bias reporting.
Discussion: Entails interpretation, strengths, limitations, and implications for practice and research.
Other Information: Includes registration details, funding, competing interests, and data/code availability statements.

Empirical investigation demonstrates that systematic use of structured PRISMA checklists (in Markdown, JSON, XML, or plain text) substantially increases both the accuracy and transparency of review reporting compared to manuscript-only descriptions (Kataoka et al., 20 Nov 2025). For instance, LLM-based tools achieve 78.7–79.7% adherence-check accuracy with structured checklists versus only 45.2% for manuscript-only input.

2. Methodological Implementation and Automation

PRISMA's comprehensive reporting requirements necessitate detailed documentation and transparent handling of each phase:

Search Strategy: Formulation and rigorous documentation of search strings, database querying procedures, and inclusion/exclusion criteria. The use of controlled vocabularies (e.g., MeSH terms) and PICOT frameworks is standard (Morriss et al., 2024, Molla et al., 15 Jul 2025).
Screening and Selection: Explicit tracking and justification for both included and excluded records, typically visualized via a PRISMA 2020 flow diagram (Morriss et al., 2024).
Data Extraction: Standardized extraction of bibliographic metadata and relevant study characteristics using tabulated formats or dedicated bibliographic software (e.g., EndNote, Excel) (Molla et al., 15 Jul 2025).
Risk of Bias Assessment: Transparent documentation of bias considerations, though there are persistent reporting gaps in some domains (e.g., bibliometric SRs may omit formal bias appraisals) (Molla et al., 15 Jul 2025).

Recent advances demonstrate the feasibility of automating PRISMA-compliant workflows. The Literature Review Network (LRN) exemplifies an explainable AI platform operationalizing every PRISMA 2020 item within an end-to-end SLR pipeline, spanning search, RLHF-driven screening, model-based classification, bias quantification, automated synthesis, and auditable reporting (Morriss et al., 2024).

3. Evaluation Metrics and Statistical Rigor

Assessment of PRISMA-compliant workflows—and their automation—is underpinned by formal performance metrics and rigorous statistical testing:

Classification Accuracy, Precision, Recall, F₁-score: Quantify the correctness and reliability of model-based screening and inclusion judgments.

$\mathrm{Precision} = \frac{TP}{TP + FP}$

$\mathrm{Recall} = \frac{TP}{TP + FN}$

$F_1 = 2 \times \frac{\mathrm{Precision}\times\mathrm{Recall}}{\mathrm{Precision}+\mathrm{Recall}}$

Jaccard Index (set overlap):

$J(A,B) = \frac{\lvert A \cap B\rvert}{\lvert A \cup B\rvert}$

Measures concordance between included article sets (e.g., SME vs. automated review) (Morriss et al., 2024).

Cohen’s Kappa ( $\kappa$ ):

$\kappa = \frac{p_o - p_e}{1 - p_e}$

where $p_o$ is observed agreement and $p_e$ is chance agreement, used for inter-rater reliability benchmarking (Morriss et al., 2024).

Sensitivity and Specificity: Particularly important in automated adherence-checking, reflecting the trade-off between missing true checklist violations and overflagging compliant items (Kataoka et al., 20 Nov 2025).

Benchmarking indicates substantial time and labor savings with AI-driven PRISMA workflows (e.g., LRN reduces human input from ~19,920 minutes to ~289 minutes), with high coverage of relevant literature and strong interrater reliability (e.g., $\kappa\approx0.5$ ) in optimal models (Morriss et al., 2024).

4. Extensions, Adaptations, and Emerging AI Integration

PRISMA's canonical structure has been extended to accommodate novel modalities (e.g., LLM-assisted SLRs) and emerging research domains:

PRISMA-DFLLM (Susnjak, 2023): A domain-specific extension incorporating reporting on dataset curation, LLM selection and finetuning strategy, hyperparameters, benchmarking, stability, qualitative failure analysis, alignment, and legal/ethical compliance. PRISMA-DFLLM adds new checklist items (e.g., data format and augmentation, model specifications, and living update mechanisms), reflecting the integration of AI into SLR workflows.
Bibliometric and Thematic Reviews: Adaptations for non-interventional review types, such as those used in AI-in-journalism studies, overlay network mapping and sentiment analysis on the PRISMA backbone but may omit some risk-of-bias reporting or detailed protocol registration (Molla et al., 15 Jul 2025).
Automated Adherence Checking: LLMs can evaluate PRISMA compliance at high sensitivity using structured checklists, though high false-positive rates necessitate expert-in-the-loop verification for editorial decision-making (Kataoka et al., 20 Nov 2025).

5. Practical Benefits and Ongoing Challenges

Systematic PRISMA adoption delivers numerous benefits:

Transparency & Reproducibility: Complete audit trails, standardized flow diagrams, and itemized checklists support reproducibility and peer audit (Morriss et al., 2024).
Efficiency & Scalability: Automated platforms dramatically reduce the manual workload, facilitating large-scale or “living” SLRs that are incrementally updated (Susnjak, 2023).
Methodological Rigor: Standardized reporting ensures comparability across studies, enables reliable meta-analyses, and underpins evidence-based practice (Kataoka et al., 20 Nov 2025).

However, limitations persist:

Domain and Language Bias: Restriction to specific databases or languages can introduce selection bias, limiting generalizability (Morriss et al., 2024, Molla et al., 15 Jul 2025).
Reporting Gaps: Risk-of-bias assessment, protocol registration, and comprehensive checklist mapping are not universally implemented, particularly outside biomedical domains (Molla et al., 15 Jul 2025).
Automation Limits: AI-driven compliance checking is not fully accurate—current models can surpass 95% sensitivity but only ~49% specificity, requiring manual adjudication (Kataoka et al., 20 Nov 2025).
Legal and Ethical Issues: Use of proprietary datasets for LLM finetuning and data availability for code/model sharing invoke copyright and privacy concerns (Susnjak, 2023).

6. Future Directions and Methodological Roadmaps

A plausible implication is that PRISMA will continue to evolve alongside methodological innovations in systematic reviewing:

Structured, Machine-readable Guidelines: The use of Markdown/JSON formats is recommended to facilitate automated adherence checks by LLMs without sacrificing human interpretability (Kataoka et al., 20 Nov 2025).
AI-augmented Living SLRs: PRISMA-DFLLM and similar frameworks advocate for continuous model/dataset updates, efficient incremental evidence synthesis, and distributed model benchmarking (Susnjak, 2023).
Integration of Explainability and Uncertainty Quantification: Future PRISMA-compliant AI tools are expected to provide not only decision outputs but also auditability, rationale traces, and confidence estimates (Morriss et al., 2024).
Interoperable and Open Workflows: The sharing of models, code, and datasets is highlighted as critical for democratizing access and ensuring methodological rigor (Susnjak, 2023).

Key research priorities include advancing automated data extraction (including tables/figures), empirically optimizing finetuning strategies, robust benchmarking under domain drift, and resolving legal/ethical barriers to open-access model development (Susnjak, 2023).