DOREMI: Optimizing Tail Relations in DocRE

Updated 23 January 2026

The paper demonstrates that DOREMI boosts tail relation F1 scores by up to +33.5 on extreme-tail classes using targeted annotation.
DOREMI employs an iterative, active selection process based on ensemble disagreement to prioritize rare relation instances with minimal labeling effort.
Compared to traditional denoising and augmentation methods, DOREMI achieves superior overall and rare-class performance in document-level relation extraction.

Document-level Relation Extraction optimizing the long tail (DOREMI) refers to a modern framework designed to address the pronounced class imbalance inherent in document-level relation extraction (DocRE) tasks. In DocRE, entity-relation pairs within long-form documents often follow a heavy-tailed distribution: a small number of relation types occur frequently (head relations), while the majority appear infrequently (tail relations), leading to poor generalization on rare relation types. DOREMI systematically improves model performance for tail relations with minimal annotation effort by leveraging active instance selection and targeted manual annotation, resulting in substantial improvements on rare relation classes while maintaining or enhancing overall model robustness (Menotti et al., 16 Jan 2026).

1. Characterization of the Long-Tail Problem in DocRE

Document-level RE requires predicting all relevant semantic relations between ordered entity pairs $(s, o)$ from a predefined set $\mathcal{R}$ , aggregating evidence across the entire document. In major benchmarks such as DocRED, the relation label distribution is sharply skewed: only a few relations have thousands of training examples, while the majority of relations (“tail relations”) have $O(10-100)$ instances. Models trained on standard distant supervision (DS) corpora achieve near-zero recall on these tail relations due to insufficient supervision and overwhelming gradient signal from head relations (Du et al., 2022, Han et al., 2022, Li et al., 2023).

Traditional denoising and augmentation strategies—such as uncertainty-guided label denoising (UGDRE) (Sun et al., 2023), generative augmentation via VAEs or LLMs (Tran et al., 2024, Li et al., 2023), or relation co-occurrence modeling (Han et al., 2022)—improve tail performance by either filtering noisy pseudo-labels or generating additional synthetic data. However, these approaches are constrained by the inherent limitations of data quality or the scalability of naive manual annotation.

2. DOREMI Framework: Iterative Active Human-in-the-Loop Enhancement

DOREMI (DOcument-level Relation Extraction optiMizing the long taIl) (Menotti et al., 16 Jan 2026) frames the improvement of tail relation coverage as an iterative, budget-constrained optimization—explicitly leveraging annotation resources to maximize improvements on underrepresented relations.

Joint Optimization Objective

$\min_{\theta}\; \mathcal{L}(\theta; \mathrm{HA} \cup T)\quad\text{s.t.}\quad |T| \le B$

where $\theta$ denotes DocRE model parameters, $\mathrm{HA}$ a small human-annotated seed, $T$ the set of selected examples for annotation, and $B$ the annotation budget.

Iterative Process At each round $t$ , a set $S_t$ of $\mathcal{R}$ 0 candidate entity pairs is sampled (from $\mathcal{R}$ 1) according to an informativeness score $\mathcal{R}$ 2. The batch $\mathcal{R}$ 3 is annotated, models are finetuned, and the selection process is repeated, prioritizing examples with maximum model ensemble disagreement. Aggregated predictions with thresholding produce DDS (Denoised DS) for downstream training.
Sampling Criteria
- Per-relation disagreement:
$\mathcal{R}$ 4 - Aggregate multi-label disagreement:

$\mathcal{R}$ 5

$\mathcal{R}$ 6 is selected to maximize $\mathcal{R}$ 7 where $\mathcal{R}$ 8 is instantiated as $\mathcal{R}$ 9.

Scalability Empirical configurations annotate only 0.001–0.003% of DS, e.g., $O(10-100)$ 0 suffices for DocRED ( $O(10-100)$ 16 expert hours).

3. DOREMI Integration with Existing DocRE Models

DOREMI is model-agnostic; any DocRE model—such as ATLOP, DREEAM, or those employing advanced denoising (Sun et al., 2023), generative augmentation (Tran et al., 2024, Li et al., 2023), or relation correlation modeling (Han et al., 2022)—can serve as the “core” classifier. At each annotation step, an ensemble of core models is trained and used for informativeness scoring. Once DDS is constructed, the downstream DocRE model is (re-)trained from scratch on $O(10-100)$ 2, benefitting from reduced noise and improved tail coverage.

Comparisons with UGDRE (Sun et al., 2023) and generative augmentation frameworks (Tran et al., 2024, Li et al., 2023) indicate that DOREMI's active sample selection is particularly effective for tail relations, as it directly exploits differences in model uncertainty and disagreement where labeled data is most scarce.

4. Quantitative Impact and Empirical Results

DOREMI achieves marked improvements for rare relation classes on DocRED and its cleaned counterpart Re-DocRED. In head-to-head evaluation against label denoising and hybrid annotation approaches, DOREMI delivers:

Dataset	Metric	UGDRE	DOREMI	$O(10-100)$ 3 (DOREMI - UGDRE)
DocRED-dev	Tail F1	—	+5.0	+5.0
DocRED-dev	Tail-ignF1	—	+28.7	+28.7
Re-DocRED-test	Tail F1	—	+16.2	+16.2
Re-DocRED-test	Extreme-tail ignF1	—	+33.5	+33.5

Additional precision and recall gains are observed, with overall F1 improvements for tails up to $O(10-100)$ 4 points and for extreme-tail relations ( $O(10-100)$ 5100 instances) ignPrecision up to $O(10-100)$ 6 points (Menotti et al., 16 Jan 2026). These results substantially surpass gains reported for uncertainty-driven denoising (UGDRE: +2.28 IgnF1) (Sun et al., 2023), correlation-guided augmentation (Correl: +1.54 Macro@100 F1) (Han et al., 2022), and VaeDiff-DocRE generative augmentation (+0.86 LTail F1) (Tran et al., 2024).

5. Comparison with Alternative Long-Tail Mitigation Strategies

Several orthogonal strategies have been introduced for the long-tail in DocRE, all demonstrating complementary or additive gains:

Uncertainty-guided denoising (UGDRE): Instance-level MC Dropout uncertainty with dynamic class thresholds $O(10-100)$ 7 for selective relabeling; large gains for rare classes, but limited by initial DS coverage (Sun et al., 2023).
Representation-level augmentation (ERA/ERACL): Attention-weight masking for entity-pair representations; explicitly boosts tail gradients via targeted context perturbation and contrastive learning (Du et al., 2022).
Generative data augmentation (VaeDiff-DocRE, LLM+NLI): Embedding-space VAE+diffusion sampling (class-conditional) for multi-label relation-specific synthetic examples (Tran et al., 2024); LLM in-context generation + NLI mapping, effective for dataset expansion and particularly boosting sparse relations (Li et al., 2023).
Correlation modeling: Explicit co-occurrence tasks (CRCP/FRCP) to transfer representation robustness from head to tail via relation embedding structure (Han et al., 2022).

While denoising and augmentation are effective, DOREMI's budgeted, active annotation is unique in directly targeting the set of instances with maximal expected informational value for tail relation generalization.

6. Analysis, Limitations, and Future Directions

DOREMI's performance is dependent on the diversity and reliability of its core model ensemble; if the models share the same architectural backbone, ensemble disagreement can be limited, reducing selection informativeness. Potential extensions include richer cost models for annotation (variable per relation), integration of density/representativeness metrics to avoid outlier oversampling, and incorporation into multi-task RE with supporting evidence prediction.

While the annotation overhead is minimal, further improvements may be realized by integrating LLM-based denoisers or direct generative augmentation to further flatten the long-tail. Scalability to large-scale real-world corpora and highly domain-specific relation sets remains an open area, as does the joint optimization of model and annotation strategies (Menotti et al., 16 Jan 2026).

Key references for DOREMI and related long-tail DocRE strategies include:

"DOREMI: Optimizing Long Tail Predictions in Document-Level Relation Extraction" (Menotti et al., 16 Jan 2026)
"Uncertainty Guided Label Denoising for Document-level Distant Relation Extraction" (Sun et al., 2023)
"VaeDiff-DocRE: End-to-end Data Augmentation Framework for Document-level Relation Extraction" (Tran et al., 2024)
"Semi-automatic Data Enhancement for Document-Level Relation Extraction with Distant Supervision from LLMs" (Li et al., 2023)
"Document-level Relation Extraction with Relation Correlations" (Han et al., 2022)
"Improving Long Tailed Document-Level Relation Extraction via Easy Relation Augmentation and Contrastive Learning" (Du et al., 2022)