Three-Way Semantic Relation Discriminator
- The paper presents a Transformer-based discriminator that jointly classifies synonym, antonym, and co-hyponym pairs, achieving a macro-F1 score of 0.90.
- It utilizes a class-weighted cross-entropy loss and processes 843,000 term pairs from both LLM-generated and human-curated datasets to handle class imbalance.
- The model is pivotal for semantic graph construction by filtering 1.3 billion candidate links down to 520 million high-confidence synonym connections, reducing semantic drift.
A three-way semantic relation discriminator is a Transformer-based model for classifying the relationship between pairs of lexical items as either synonym, antonym, or co-hyponym. Developed to address the limitations of raw embedding similarity—especially the inability of neural embeddings to distinguish between synonymy and antonymy—it enables robust, large-scale semantic disambiguation crucial for constructing high-precision lexical graphs in morphologically rich and low-resource languages (Tosun et al., 19 Jan 2026). Unlike naive similarity-based heuristics, the discriminator jointly models fine-grained lexical relations, mitigating semantic drift and the intrusion of opposites.
1. Model Architecture and Input Representation
The discriminator models the identification of synonymy, antonymy, and co-hyponymy as a standard pairwise classification task. Each input instance , where and are lexical terms, is tokenized and formatted as CLS[]\,t_2\,SEP. This representation is processed by a Transformer encoder (turkish-e5-large, XLM-RoBERTa–based), producing a final [CLS] token hidden state with .
A single linear output layer projects to a three-dimensional vector of logits:
with and , corresponding to the classes: synonym, co-hyponym, antonym. Logits are converted to predicted class probabilities via the softmax function:
where indexes the three classes (Tosun et al., 19 Jan 2026).
2. Mathematical Formulation and Training Objective
Given the input pair , the model outputs the probability vector . The prediction is made by selecting the maximum probability class:
To address class imbalance in the training set (co-hyponyms dominate), the loss function is the class-weighted cross-entropy:
where is the total number of training examples, is the count for class , and . Ground-truth labels use one-hot encoding: 0 (antonym), 1 (co-hyponym), 2 (synonym) (Tosun et al., 19 Jan 2026).
3. Labeled Dataset Construction
The training set comprises 843,000 unique term pairs annotated as synonym, antonym, or co-hyponym. Synthetic labels were generated via Gemini 2.5-Flash LLM, by first agglomeratively clustering FastText embeddings of 110,000 Turkish seed terms into 13,000 clusters, then prompting Gemini 2.5-Flash to label each intra-cluster pair. This yielded approximately 827,000 LLM-generated pairs, while 16,000 high-precision synonym/antonym pairs were extracted from the human-curated "Türkçe Eş Anlamlılar Sözlüğü." The dataset exhibits class imbalance, chiefly due to co-hyponyms. Human verification on the dictionary-based subset and spot-checking of LLM-labeled pairs confirmed at least 98% precision for clear-cut cases (Tosun et al., 19 Jan 2026).
| Source | Number of Pairs | Label Source |
|---|---|---|
| Gemini 2.5-Flash LLM | ≃827,000 | LLM-generated, cluster-based |
| Türkçe Eş Anlamlılar Sözlüğü | 16,000 | Human-curated |
Co-hyponyms constitute a majority, necessitating class weighting in the loss.
4. Training Regimen and Hyperparameters
The model is fine-tuned on the labeled set using the Fused AdamW optimizer with decoupled weight decay (0.01), cosine learning rate decay with linear warmup (first 10% of steps; peak learning rate ), and gradient clipping (max norm=1.0). Training was conducted in two phases, with batch size scaling from 64 (RTX 3060) to 128 (L40S), in bf16 precision, over 5 epochs, with early stopping on validation macro-F1. No explicit hard-negative mining or curriculum learning is employed, beyond balanced sampling through class weighting (Tosun et al., 19 Jan 2026).
5. Evaluation and Comparative Analysis
Evaluation yields a macro-F1 score of 0.90, with per-class metrics as follows:
| Class | Precision | Recall | F1 |
|---|---|---|---|
| Synonym | 0.76 | 0.90 | 0.83 |
| Antonym | 0.91 | 0.93 | 0.92 |
| Co-hyponym | 0.93 | 0.95 | 0.94 |
| Macro-Avg | 0.88 | 0.92 | 0.90 |
Most synonym errors occur due to confusion with co-hyponyms, indicating that fine categorization between semantic equivalence and close semantic field membership remains challenging. Antonym identification exhibits high precision with minimal false positives. By contrast, cosine similarity thresholding (≥0.85) achieves only ≲0.65 macro-F1 due to antonym intrusion and co-hyponym noise; the three-way discriminator provides a ≃25 F1 point performance gain (Tosun et al., 19 Jan 2026).
6. Integration With Semantic Graph Construction
The discriminator's primary application is in high-precision semantic graph construction. Its outputs filter candidate pairs for downstream clustering: only pairs classified as synonym with and classifier agreement on both and are retained. This phase reduces 1.3 billion raw nearest-neighbor candidates to 520 million high-confidence synonym links. Subsequent topology-aware soft-to-hard clustering assigns each term to exactly one semantically coherent cluster, sharply mitigating semantic drift chains and antonym inclusion. The classifier does not itself perform clustering, but provides the critical filtering step that enables high-precision, non-overlapping cluster assignment (Tosun et al., 19 Jan 2026).
7. Limitations, Error Analysis, and Potential Extensions
The main difficulty lies in synonym vs. co-hyponym discrimination, especially for abstract or morphologically complex terms with sparse context (F1=0.83 for synonym class). The current architecture generalizes effectively within Turkish but would need retraining or adaptation for other languages. The extensive use of LLM-generated labels risks systematic biases, such as overgeneration of co-hyponyms in specific domains. False negatives on synonyms often involve morphological variants or under-represented multi-word expressions; some antonyms with intensification semantics are misclassified absent explicit negation cues.
Potential enhancements include incorporation of margin-based contrastive objectives (e.g., triplet loss), post-hoc hard negative mining, and cross-lingual transfer learning strategies. A plausible implication is that further improvements may be achievable via active selection of challenging negative samples and multilingual alignment (Tosun et al., 19 Jan 2026).
8. Broader Context: Relation to General Lexical-Semantic Classification
While the three-way semantic relation discriminator is designed for distinguishing synonym, antonym, and co-hyponym relations in Turkish, its architecture and methodology bear significant relation to the broader lexical-semantic classification literature. Notably, the equivalence of related tasks—Word-in-Context (WiC), Target Sense Verification (TSV), and Word Sense Disambiguation (WSD)—has been formally established. These tasks are proven reducible to each other in polynomial time under the sense–meaning hypothesis, both theoretically and empirically (Hauer et al., 2021). This suggests resources and modeling solutions developed for one semantic classification task may, under suitable formal correspondences, be applied or adapted to related problems, including the multi-way discriminative model at the core of large-scale lexical graph construction.