Papers
Topics
Authors
Recent
Search
2000 character limit reached

AraBART+Morph+GEC for Arabic Grammatical Correction

Updated 25 November 2025
  • The paper introduces AraBART+Morph+GEC, which combines a BART-based encoder-decoder model with detailed morphological embeddings and a GED objective to improve Arabic error correction.
  • It employs a refined edit selection pipeline using logistic regression, agreement boosting, and non-maximum suppression, achieving up to 84.64% F₀.₅ on QALB-15 benchmarks.
  • Serving as a key component of the ArbESC+ ensemble, the system leverages both neural and linguistic features to set a new state-of-the-art for Arabic grammatical error correction.

AraBART+Morph+GEC is an Arabic grammatical error correction (GEC) system integrating a BART-based sequence-to-sequence architecture, explicit morphological analysis, and a parallel grammatical error detection (GED) objective. Developed as a key component of the ArbESC+ multi-system edit selection framework, it leverages both neural and linguistic features to address the challenges of morphologically rich and syntactically complex Arabic text. The system combines span-based edit proposals from independently trained variants, enabling fine-grained correction decisions within a larger ensemble strategy (Alrehili et al., 18 Nov 2025).

1. Architecture of AraBART+Morph+GEC

1.1 Base AraBART Backbone

AraBART employs the encoder–decoder “denoising” transformer originally proposed by Lewis et al. (2019), re-pretrained on extensive Arabic corpora as described in Antoun et al. (2020). Arabic-specific modifications include a BPE vocabulary (approximately 42,000 tokens), script-level adjustments for right-to-left text, and orthographic normalization. Pretraining objectives follow standard BART, encompassing masked token infilling, masked span infilling, and next-sentence prediction, all adapted for Arabic data.

1.2 Morphological Feature Integration

Morphological information is introduced via CAMeL Tools’ MADA+ analyzer. For each input token position ii, discrete features are extracted:

  • POS tag tiTPOSt_i\in T^{\mathrm{POS}}
  • Stem sis_i
  • Root rir_i
  • Additional attributes fiFf_i\in F (number, gender, case, etc.)

These features are embedded as follows:

E(i)m=EPOS[ti]+Estem[si]+Eroot[ri]+fFEf[fi]E^m_{(i)} = E^{\mathrm{POS}}[t_i] + E^{\mathrm{stem}}[s_i] + E^{\mathrm{root}}[r_i] + \sum_{f\in F} E^f[f_i]

The encoder’s input vector at each position is

hi(0)=E(i)tok+Eipos+E(i)mh^{(0)}_i = E^{\text{tok}}_{(i)} + E^{\text{pos}}_i + E^m_{(i)}

Optionally, internal layers inject morphological embeddings into the multi-head self-attention keys and values,

Q=WQh(1)  ;  K=WKh(1)+WmEm  ;  V=WVh(1)+WmEmQ^\ell = W_Q h^{(\ell-1)} \; ; \; K^\ell = W_K h^{(\ell-1)} + W_m E^m \; ; \; V^\ell = W_V h^{(\ell-1)} + W'_m E^m

enabling direct incorporation of morphological cues in attention computations.

1.3 GEC-specific Multi-task Objectives

The model is trained for both text generation and error detection:

  • Sequence generation with standard cross-entropy loss:

Lseq=tlogp(yty<t,x)L_{\text{seq}} = -\sum_t \log p(y_t \mid y_{<t}, x)

  • Grammatical error detection (GED):

LGED=i[gilogσ(ui)+(1gi)log(1σ(ui))]L_{\text{GED}} = -\sum_i \left[ g_i \log \sigma(u_i) + (1-g_i)\log(1-\sigma(u_i)) \right]

where uiu_i is the GED logit for token ii, gig_i is its gold label, and σ\sigma is the sigmoid. The full objective is L=Lseq+λLGEDL = L_{\text{seq}} + \lambda L_{\text{GED}} with λ=1.0\lambda=1.0.

1.4 Training Regime and Hyperparameters

  • Data: QALB-2014, QALB-2015, ZAEBUC corpora (joint/separate variants)
  • Optimization: AdamW, learning rate 2×1052 \times 10^{-5}, weight decay 0.01
  • Batch size: 16, mixed precision (fp16)
  • Training epochs: 50, early stopping via development set
  • Inference: beam search, beam size 5, max output length 100

2. Generation and Featureization of Correction Proposals

2.1 Candidate Edit Extraction

At inference, three independently trained AraBART+Morph+GEC models (corresponding to QALB-14, QALB-15, and ZAEBUC domains) generate corrected sentences. Source-to-output alignments yield proposed span edits e=(a,b,r)e=(a,b,r), interpreted as replacements of source tokens [a..b1][a..b-1] by string rr.

2.2 Numerical Feature Representation

For each edit, multiple features are computed:

  • System confidence: For a proposal ee from system kk, the normalized probability mass assigned to edits containing ee across output beams,

ck(e)=bbeams1[eE(yb)]softmaxb(sb)c_k(e) = \sum_{b\in\text{beams}} \mathbf{1}[e\in E(y_b)] \cdot \mathrm{softmax}_b(s_b)

  • Morphological consistency: MC(e)[0,1]M_C(e)\in[0,1] measures alignment between the replacement rr and MADA+ predicted gold features:

MC(e)=1FfF1[f(predicted on r)=f(gold)]M_C(e) = \frac{1}{|F|}\sum_{f\in F}\mathbf{1}[f(\text{predicted on }r) = f(\text{gold})]

  • Span features: Size of the replaced span (ba)(b-a) and length of rr.

3. Edit Selection: Classifier and Decision Pipeline

3.1 Feature Vector Construction

A binary feature vector xe{0,1}K×Tx_e\in\{0,1\}^{K\times T} (where K=9K=9 is the number of systems, T=3T=3 is the number of edit types) specifies which system(s) proposed ee of each type (insertion, deletion, substitution). Optionally, real-valued meta-features (system confidence, morphological consistency, span length) are appended.

3.2 Logistic Regression Scoring

Each candidate ee receives a raw probability score via logistic regression,

praw(e)=σ(wxe+b)p_{\text{raw}}(e) = \sigma(w^\top x_e + b)

optimized with binary cross-entropy on labeled edits.

3.3 Agreement Boosting and Dual-Threshold Filtering

System agreement is quantified:

  • n(e)=k1[system k proposed e]n(e) = \sum_k \mathbf{1}[\text{system }k\text{ proposed }e]
  • Boost factor: boost(e)=min(1+β(n(e)1),c)\mathrm{boost}(e) = \min(1+\beta(n(e)-1),c)
  • Adjusted score: padj(e)=praw(e)boost(e)p_{\text{adj}}(e) = p_{\text{raw}}(e)\cdot\mathrm{boost}(e)

Candidate ee is accepted if praw(e)τp_{\text{raw}}(e)\geq \tau and padj(e)ατp_{\text{adj}}(e)\geq \alpha\tau, enforcing both raw confidence and agreement.

3.4 Non-Maximum Suppression for Conflict Resolution

Inter-edit overlap is measured by one-dimensional IoU:

IoU(ei,ej)=max(0,min(bi,bj)max(ai,aj))(biai)+(bjaj)max(0,min(bi,bj)max(ai,aj))\mathrm{IoU}(e_i,e_j) = \frac{\max(0,\min(b_i,b_j)-\max(a_i,a_j))}{(b_i-a_i)+(b_j-a_j)-\max(0,\min(b_i,b_j)-\max(a_i,a_j))}

A greedy non-maximum suppression (NMS) procedure selects highest padjp_{\text{adj}} edits while ensuring non-overlapping spans (threshold θ=0\theta=0), with at most one insertion per position.

4. System Combination in ArbESC+ Framework

4.1 Model Ensemble

The full ArbESC+ system integrates:

  • Four sequence-to-sequence GEC models: AraT5, ByT5, mT5, AraBART
  • Three AraBART+Morph+GEC models (trained on QALB-14, QALB-15, ZAEBUC)
  • Two text-editing models

This ensemble yields K=9K=9 candidate outputs per sentence.

4.2 Combination and Decision Pipeline

The ensemble workflow is as follows:

  1. Aggregate unique span edits from all $9$ systems.
  2. Encode features for each edit as described above.
  3. Score with logistic regression.
  4. Apply agreement boosting and dual-threshold filtering.
  5. Resolve conflicts via NMS.
  6. Sequentially apply surviving edits to the left-to-right source.

4.3 Rationale for Micro-edit Level Combination

Micro-edit based voting enables fine-grained error correction where edits, rather than whole sentences, are the central decision unit. This enables contributions from high-confidence system components even when they disagree on overall sentence structure. Thresholding and agreement-based boosting limit spurious or low-confidence edits, while NMS prevents conflicting alterations on overlapping spans.

5. Empirical Performance and Ablative Analyses

5.1 Comparative Results

Model QALB-14 QALB-15 L1 QALB-15 L2
AraBART+Morph+GEC (2014) 76.20% 78.85% 52.00%
AraBART+Morph+GEC (2015) 77.99% 77.97% 60.98%
AraBART+Morph+GEC (ZAEBUC) 77.85% 77.73% 60.79%
ArbESC+ (all 9 combined) 82.63% 84.64% 65.55%

ArbESC+ outperforms single models by 4–6 F₀.₅ points across all benchmarks, establishing new state-of-the-art performance for Arabic GEC.

5.2 System Combination vs. Baselines

Majority voting, weighted voting, minimum Bayesian risk (MBR), and standard ESC system combinations are all surpassed by ArbESC+ by 1–3 F₀.₅ points on each evaluation split.

5.3 Model Number and Impact

Ablation results show that using only the best 3–5 models achieves F₀.₅ scores of 80.71–80.77 on QALB-14, compared to 82.63 for the full 9-model ArbESC+ system. Including all 9 but without the selection combiner yields F₀.₅=80.78, indicating that the edit-level combination pipeline yields further gains.

5.4 Threshold Sensitivity

The dual-threshold filtering is sensitive: values of τ\tau below 0.5 admit too many low-quality edits and depress F₀.₅, whereas τ\tau above 0.9 sacrifices recall. Optimal values of τ0.7\tau\approx0.7–$0.8$ deliver the strongest results.

5.5 Effect of Morphological Features

AraBART+Morph+GEC’s explicit use of morphological embeddings and parallel GED objectives yields a ≈2 F₀.₅ point improvement over vanilla AraBART, confirming the value of linguistic feature integration for Arabic GEC model proposals.

6. Summary and Significance

AraBART+Morph+GEC augments the standard Arabic BART transformer with detailed morphological features and a grammatical error detection head, producing more accurate and linguistically informed error corrections. Serving as black-box proposal generators within ArbESC+, its outputs are processed via a classifier pipeline that integrates proposals from nine diverse systems, leverages model agreement, filters candidates based on calibrated confidence thresholds, and resolves conflicts via span-level NMS. With final F₀.₅ scores of 82.63%, 84.64%, and 65.55% on the QALB-14, QALB-15 L1, and QALB-15 L2 benchmarks, AraBART+Morph+GEC—especially within ArbESC+—sets a new performance baseline for Arabic grammatical error correction and exemplifies the impact of combining neural and morphological approaches (Alrehili et al., 18 Nov 2025).

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to AraBART+Morph+GEC.