A computational system to handle the orthographic layer of tajwid in contemporary Quranic Orthography

Published 16 May 2025 in cs.CL | (2505.11379v1)

Abstract: Contemporary Quranic Orthography (CQO) relies on a precise system of phonetic notation that can be traced back to the early stages of Islam, when the Quran was mainly oral in nature and the first written renderings of it served as memory aids for this oral tradition. The early systems of diacritical marks created on top of the Quranic Consonantal Text (QCT) motivated the creation and further development of a fine-grained system of phonetic notation that represented tajwid-the rules of recitation. We explored the systematicity of the rules of tajwid, as they are encountered in the Cairo Quran, using a fully and accurately encoded digital edition of the Quranic text. For this purpose, we developed a python module that can remove or add the orthographic layer of tajwid from a Quranic text in CQO. The interesting characteristic of these two sets of rules is that they address the complete Quranic text of the Cairo Quran, so they can be used as precise witnesses to study its phonetic and prosodic processes. From a computational point of view, the text of the Cairo Quran can be used as a linchpin to align and compare Quranic manuscripts, due to its richness and completeness. This will let us create a very powerful framework to work with the Arabic script, not just within an isolated text, but automatically exploring a specific textual phenomenon in other connected manuscripts. Having all the texts mapped among each other can serve as a powerful tool to study the nature of the notation systems of diacritics added to the consonantal skeleton.

Abstract PDF Upgrade to Chat

Authors (1)

Alicia González Martínez

Summary

The paper presents a Python-based system that accurately applies and removes tajwid notation using cascade rewrite rules.
It employs detailed regular expression rules for assimilation, elongation, and pausal marks, validated by perfect text restoration tests.
The system offers a foundation for future research by enabling digital comparisons and enhancements of Qur'anic manuscript orthography.

A Computational System to Handle the Orthographic Layer of Tajwid in Contemporary Quranic Orthography

Overview of the Computational Challenge

The paper introduces a Python-based computational system designed to manage the orthographic layer of tajwid within Contemporary Qur'anic Orthography (CQO). The tajwid rules are an essential component of Qur'anic recitation, providing phonetic guidance that preserves the traditional oral pronunciation. This study focuses on the Cairo Qur'an, the most widely used textual resource for CQO, to develop a system capable of automatically adding or removing tajwid notation layers. The system uses a comprehensive set of regular expression rules to process the orthographic phenomena related to tajwid.

Methodology and Implementation

The system employs a two-phase process consisting of cascade rewrite rules to handle assimilation, elongation, and pausal notation:

Assimilation Rules: These cover phenomena such as al-nun al-sakinah, al-mīm al-sākina, and al-lām al-sākinah. The rules are designed to manage complex assimilation processes across word boundaries and within words. Critical to the assimilation logic is the ability to distinguish nouns from verbs, leveraging linguistic resources like the corpus.quran.com project for accurate morphosyntactic analyses.
Elongation Rules: The elongation (madd) rules involve specific conditions where long vowels are followed by hamza or geminate consonants. The script uses precise regular expressions to accurately apply madd signs, including the miniature waw and ya, under specific phonetic contexts.
Pausal Marks: The system accounts for pause-related notations like al-șifr al-mustadīr and al-șifr al-mustațīl al-qā'im. Although pausal marks are semantically driven, the system must account for exceptions and special cases.

This robust system allows for both the application and removal of tajwid markings with high accuracy, ensuring that these transformations do not alter the original text content when reverted.

Results and Validation

The system successfully validates the two-phase conversion process, ensuring that the original and restored texts match perfectly. This validation underscores the accuracy and reliability of the system, which can handle the intricate orthographic rules of tajwid seamlessly. The successful implementation demonstrates the potential for computational systems to enhance the study and teaching of Qur'anic recitation by providing digitally assisted orthography management.

Implications and Future Directions

The implementation of such a computational system opens numerous avenues for studying and comparing different Qur'anic manuscripts. By using the Cairo Qur'an as a digitally enriched reference, this system enables the alignment of various manuscripts, offering a dynamic platform for further linguistic and textual studies. Future research could extend this system to incorporate other readings (qira'at) beyond Hafs' 'an 'Asim, enhance rule sets, and integrate additional layers of linguistic analysis.

Conclusion

The development of this system marks a significant computational advancement in handling the orthographic complexity of contemporary Qur'anic orthography. By digitally capturing the intricate rules of tajwid, this approach creates a flexible and precise framework for both linguistic scholarship and pedagogical uses, setting the stage for further enhancements and applications in the study of Qur'anic texts. The attention to linguistic detail and computational precision serves as a model for future efforts to digitize and analyze complex linguistic systems.

Markdown Report Issue