Papers
Topics
Authors
Recent
Search
2000 character limit reached

IslamicFaithQA Overview

Updated 18 January 2026
  • IslamicFaithQA is a framework of computational methods and benchmarks focused on providing doctrinally faithful, evidence-grounded responses to Islamic queries.
  • It leverages retrieval-augmented generation paradigms and multilingual datasets, ensuring precise citations from sources like the Qur’an, Hadith, and fatwā.
  • Evaluation employs robust metrics such as MAP@10, MRR, and faith-specific standards to ensure safe abstention and avoid speculative claims.

IslamicFaithQA—synthetic Editor's term for “Islamic Faith Question Answering”—designates the family of computational frameworks, benchmarks, and system architectures dedicated to robust, faithful question answering about Islam, Islamic law, and foundational texts, using NLP and LLMs. These systems confront unique requirements: doctrinal precision, obligatory citation to authoritative sources (Qur’an, Hadith, fatwā), explicit handling of missing evidence, and robust abstention from ungrounded or speculative claims. Research in IslamicFaithQA spans closed-domain chatbots, retrieval-augmented LLMs, agentic and iterative retrieval/generation pipelines, comprehensive evaluations reflecting faith-critical criteria, and multilingual/cross-lingual adaptation.

1. Core Datasets and Benchmarks

A central resource is the ISLAMICFAITHQA benchmark, a generative, bilingual (Arabic/English) evaluation set comprising 3,810 question–answer pairs. Each question is paired with a single atomic, factually grounded gold answer, strictly annotated for correctness, hallucination, and abstention. Annotation protocols require concise, single-fact responses with 82.96% inter-annotator agreement and Cohen’s κ of 0.62, with grading via multiple expert annotators and LLM-judge validation (Bhatia et al., 12 Jan 2026). This benchmark exposes aspects often missed by standard MCQ/MRC-style datasets: models are directly penalized for unsupported claims and rewarded for correct abstention (“Not_Attempted”) when evidence is lacking.

Below is a concise summary of major datasets mentioned:

Name Modality Size Evidence Requirements Key Metrics
ISLAMICFAITHQA GenQA, AR/EN 3,810 Atomic gold, citation Correct/Incorr/Abstain
IslamicPCQA Persian, PCQA N/A Documented, multi-hop NegRej, Correctness
QRCD, ARCD Extractive O(1K+) Spans in Quranic text F1, pAP, EM, MRR
Rezwan (Hadith) Factoid, AR 1.2M Full Hadith, chain Human/Human+LLM rating

Data curation frequently includes parallel expertise annotation, rigorous verification, and explicit modeling of unanswerable (“zero-answer”) cases (Bhatia et al., 12 Jan 2026, Basem et al., 2024, Oshallah et al., 29 Jan 2025). Composite benchmarks for inheritance (QIAS 2025 SubTask 1), general knowledge (SubTask 2), and Persian IslamicQA (IslamicPCQA) enable domain-specific, high-fidelity evaluation (Bekhouche et al., 30 Aug 2025, Ahmad et al., 28 Sep 2025, asl et al., 29 Oct 2025).

2. System Architectures: Retrieval and Generation Paradigms

IslamicFaithQA systems predominantly follow Retrieval-Augmented Generation (RAG) paradigms, often extended by agentic control and iterative refinement. Standard RAG employs multi-stage passage selection, typically involving:

  • Stage 1: Sparse retrieval (BM25), with full Arabic pre-processing (dediacritization, tokenization), yielding 100s–1000s of initial candidates (Ahmad et al., 28 Sep 2025).
  • Stage 2: Dense neural retrieval using language-specific or multilingual embeddings (e.g., Arabic-Triplet-Matryoshka-V2, mE5-base), ranking candidates by cosine similarity (Ahmad et al., 28 Sep 2025, Bhatia et al., 12 Jan 2026).
  • Stage 3: Cross-encoder reranking (e.g., miniLMv2, BERT, or SOTA re-rankers), attending jointly to query and passage to assign fine-grained relevance scores (Ahmad et al., 28 Sep 2025, Basem et al., 9 Aug 2025).
  • Stage 4: Prompt construction for the LLM—injects retrieved passages under a “RAG CONTEXT” header, and constrains LLM output for answer format and content.

Agentic RAG (Bhatia et al., 12 Jan 2026) extends this process via an explicit interaction loop: an agentic controller issues structured tool calls (search, read, retrieve, re-query), verifies sufficiency of evidence, and iterates retrieval/generation until confident. This iterative loop allows multi-hop reasoning, error correction, and principled abstention when sources are missing or ambiguous. Modularity supports dynamic tool integration—retrievers, readers/generators, and cross-lingual components (asl et al., 29 Oct 2025, Bhatia et al., 12 Jan 2026).

Specialized encoders—AraBERT, MARBERT, QARiB for Arabic, SBERT for Persian, mE5 for English/Arabic—are fine-tuned for dense retrieval, classification, or span extraction depending on corpus and question type (Bekhouche et al., 30 Aug 2025, asl et al., 29 Oct 2025, Basem et al., 2024).

3. Evaluation Protocols and Metrics

Evaluation in IslamicFaithQA employs both standard IR/MRC metrics and custom faith-oriented measurements:

Rigorous evaluation frameworks, such as dual-agent pipelines (quantitative, qualitative) for LLM-generated content, address doctrinal fidelity, citation integrity, and present multi-dimensional scores (structure, clarity, depth, originality, Islamic accuracy, citation accuracy) (Mushtaq et al., 28 Oct 2025).

4. Specialized Challenges and Domain Sensitivity

IslamicFaithQA confronts domain-specific obstacles:

Agentic and iterative approaches (FAIR-RAG, Agentic RAG) offer state-of-the-art performance in faithfulness, with explicit sufficiency checks (Structured Evidence Assessment, SEA) and evidence checklist fulfillment before answer generation (asl et al., 29 Oct 2025, Bhatia et al., 12 Jan 2026).

5. Integration of Source Diversity and Multilingualism

Comprehensive IslamicFaithQA systems index heterogeneous, authoritative sources—Qur’an, Hadith (e.g., Rezwan corpus, 1.2M narrations, chain–matn separated and richly annotated (Asgari-Bidhendi et al., 4 Oct 2025)), fatwā, tafsīr, and modern scholarly writing. Knowledge bases may exceed 1M documents, semantically chunked and indexed via a hybrid sparse/dense fusion (BM25 + neural embeddings + reciprocal rank fusion), with adaptive domain fine-tuning to address specialized theological vocabulary (asl et al., 29 Oct 2025).

Cross-language strategies—via translation and paraphrasing pipelines, as in the Cross-Language Quranic QA approach (Pickthall English translation, paraphrased corpus)—dramatically improve retrieval for languages with mismatched training/testing code (Oshallah et al., 29 Jan 2025). Multilingual models (mBERT, AraBERT, XLM-R) are further domain-adapted with MLM+NSP on religious corpora (Alnajjar et al., 2022). Inclusion and evaluation across Arabic, Persian, English, and additional languages is expanding, as with Rezwan’s Hadith translations (12 languages) and proposals for further South Asian and African language coverage (Asgari-Bidhendi et al., 4 Oct 2025).

6. Design Principles for Faithful, Reliable Deployment

Leading work identifies several best practices and design principles:

  • Enforce evidence grounding: All generated claims must cite explicit sources using inline markers ([1], [2]), and no fact may be introduced that is not directly supported.
  • Iterative sufficiency checks: Evidence checklists and multi-turn refinement loops ensure no missing or spurious answers (asl et al., 29 Oct 2025).
  • Cultural/sectarian awareness: Systems embed major madhhab schemas and prompt for scholarly viewpoint diversity where ambiguity exists (Mushtaq et al., 28 Oct 2025).
  • Automated and human-in-the-loop verification: Employ tool-driven citation verification and human review triggers for insufficient or questionable references (Mushtaq et al., 28 Oct 2025, Ahmad et al., 28 Sep 2025).
  • Scalable, LLM-as-Judge evaluation: Allows multi-dimensional, faith-oriented, community-reflective rating (Mushtaq et al., 28 Oct 2025, Bhatia et al., 12 Jan 2026).
  • Safe handling of legal queries: Proactive disclaimers and error-handling in fatwā/fiqh queries prevent AI-generated “fiat” rulings (asl et al., 29 Oct 2025).
  • Efficient, privacy-friendly architectures: Encoder-based solutions enable on-device IslamicFaithQA deployment in sensitive contexts, though at some accuracy tradeoff for heavily compositional reasoning (Bekhouche et al., 30 Aug 2025).

Emergent agentic, adaptive frameworks (Agentic RAG, FAIR-RAG) demonstrate that iterative interaction, sub-querying, and explicit abstention mechanisms are crucial to move from generic retrieval/generation toward truly faithful, reliable IslamicFaithQA (asl et al., 29 Oct 2025, Bhatia et al., 12 Jan 2026).


References:

(Alnajjar et al., 2022, Basem et al., 2024, Oshallah et al., 29 Jan 2025, Basem et al., 8 Aug 2025, Basem et al., 9 Aug 2025, Bekhouche et al., 30 Aug 2025, Ahmad et al., 28 Sep 2025, Asgari-Bidhendi et al., 4 Oct 2025, Mushtaq et al., 28 Oct 2025, asl et al., 29 Oct 2025, Uriawan et al., 18 Dec 2025, Bhatia et al., 12 Jan 2026)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to ISLAMICFAITHQA.