Three-Way Semantic Relation Discriminator

Updated 26 January 2026

The paper presents a Transformer-based discriminator that jointly classifies synonym, antonym, and co-hyponym pairs, achieving a macro-F1 score of 0.90.
It utilizes a class-weighted cross-entropy loss and processes 843,000 term pairs from both LLM-generated and human-curated datasets to handle class imbalance.
The model is pivotal for semantic graph construction by filtering 1.3 billion candidate links down to 520 million high-confidence synonym connections, reducing semantic drift.

A three-way semantic relation discriminator is a Transformer-based model for classifying the relationship between pairs of lexical items as either synonym, antonym, or co-hyponym. Developed to address the limitations of raw embedding similarity—especially the inability of neural embeddings to distinguish between synonymy and antonymy—it enables robust, large-scale semantic disambiguation crucial for constructing high-precision lexical graphs in morphologically rich and low-resource languages (Tosun et al., 19 Jan 2026). Unlike naive similarity-based heuristics, the discriminator jointly models fine-grained lexical relations, mitigating semantic drift and the intrusion of opposites.

1. Model Architecture and Input Representation

The discriminator models the identification of synonymy, antonymy, and co-hyponymy as a standard pairwise classification task. Each input instance $x = (t_1, t_2)$ , where $t_1$ and $t_2$ are lexical terms, is tokenized and formatted as $[$ CLS $]\,t_1\,$ [ $SEP$ ]\,t_2\, $[$ SEP $]$ . This representation is processed by a Transformer encoder (turkish-e5-large, XLM-RoBERTa–based), producing a final [CLS] token hidden state $h_\mathrm{CLS}(x) \in \mathbb{R}^D$ with $D=1024$ .

A single linear output layer projects $h_\mathrm{CLS}(x)$ to a three-dimensional vector of logits:

$\vec{f}(x) = W h_\mathrm{CLS}(x) + b,$

with $W \in \mathbb{R}^{3 \times D}$ and $b \in \mathbb{R}^3$ , corresponding to the classes: synonym, co-hyponym, antonym. Logits are converted to predicted class probabilities via the softmax function:

$\hat{y}_c(x) = \mathrm{softmax}(\vec{f}(x))_c = \frac{\exp(f_c(x))}{\sum_{k \in \{\mathrm{syn},\,\mathrm{cohyp},\,\mathrm{ant}\}} \exp(f_k(x))},$

where $c$ indexes the three classes (Tosun et al., 19 Jan 2026).

2. Mathematical Formulation and Training Objective

Given the input pair $x$ , the model outputs the probability vector $\vec{\hat{y}}(x) = (\hat{y}_\mathrm{syn},\, \hat{y}_\mathrm{cohyp},\, \hat{y}_\mathrm{ant})$ . The prediction is made by selecting the maximum probability class:

$\hat{y}(x) = \arg\max_{c \in \{\mathrm{syn},\,\mathrm{cohyp},\,\mathrm{ant}\}} f_c(x).$

To address class imbalance in the training set (co-hyponyms dominate), the loss function is the class-weighted cross-entropy:

$L_\mathrm{CE} = -\sum_{c \in \{\mathrm{syn},\,\mathrm{cohyp},\,\mathrm{ant}\}} w_c\, y_c\, \log \hat{y}_c(x),$

where $N$ is the total number of training examples, $n_c$ is the count for class $c$ , and $w_c = N / (3 n_c)$ . Ground-truth labels $y$ use one-hot encoding: 0 (antonym), 1 (co-hyponym), 2 (synonym) (Tosun et al., 19 Jan 2026).

3. Labeled Dataset Construction

The training set comprises 843,000 unique term pairs annotated as synonym, antonym, or co-hyponym. Synthetic labels were generated via Gemini 2.5-Flash LLM, by first agglomeratively clustering FastText embeddings of 110,000 Turkish seed terms into 13,000 clusters, then prompting Gemini 2.5-Flash to label each intra-cluster pair. This yielded approximately 827,000 LLM-generated pairs, while 16,000 high-precision synonym/antonym pairs were extracted from the human-curated "Türkçe Eş Anlamlılar Sözlüğü." The dataset exhibits class imbalance, chiefly due to co-hyponyms. Human verification on the dictionary-based subset and spot-checking of LLM-labeled pairs confirmed at least 98% precision for clear-cut cases (Tosun et al., 19 Jan 2026).

Source	Number of Pairs	Label Source
Gemini 2.5-Flash LLM	≃827,000	LLM-generated, cluster-based
Türkçe Eş Anlamlılar Sözlüğü	16,000	Human-curated

Co-hyponyms constitute a majority, necessitating class weighting in the loss.

4. Training Regimen and Hyperparameters

The model is fine-tuned on the labeled set using the Fused AdamW optimizer with decoupled weight decay (0.01), cosine learning rate decay with linear warmup (first 10% of steps; peak learning rate $3 \times 10^{-5}$ ), and gradient clipping (max norm=1.0). Training was conducted in two phases, with batch size scaling from 64 (RTX 3060) to 128 (L40S), in bf16 precision, over 5 epochs, with early stopping on validation macro-F1. No explicit hard-negative mining or curriculum learning is employed, beyond balanced sampling through class weighting (Tosun et al., 19 Jan 2026).

5. Evaluation and Comparative Analysis

Evaluation yields a macro-F1 score of 0.90, with per-class metrics as follows:

Class	Precision	Recall	F1
Synonym	0.76	0.90	0.83
Antonym	0.91	0.93	0.92
Co-hyponym	0.93	0.95	0.94
Macro-Avg	0.88	0.92	0.90

Most synonym errors occur due to confusion with co-hyponyms, indicating that fine categorization between semantic equivalence and close semantic field membership remains challenging. Antonym identification exhibits high precision with minimal false positives. By contrast, cosine similarity thresholding (≥0.85) achieves only ≲0.65 macro-F1 due to antonym intrusion and co-hyponym noise; the three-way discriminator provides a ≃25 F1 point performance gain (Tosun et al., 19 Jan 2026).

6. Integration With Semantic Graph Construction

The discriminator's primary application is in high-precision semantic graph construction. Its outputs filter candidate pairs for downstream clustering: only pairs classified as synonym with $\hat{y}_\mathrm{syn} \geq 0.70$ and classifier agreement on both $(A,B)$ and $(B,A)$ are retained. This phase reduces 1.3 billion raw nearest-neighbor candidates to 520 million high-confidence synonym links. Subsequent topology-aware soft-to-hard clustering assigns each term to exactly one semantically coherent cluster, sharply mitigating semantic drift chains and antonym inclusion. The classifier does not itself perform clustering, but provides the critical filtering step that enables high-precision, non-overlapping cluster assignment (Tosun et al., 19 Jan 2026).

7. Limitations, Error Analysis, and Potential Extensions

The main difficulty lies in synonym vs. co-hyponym discrimination, especially for abstract or morphologically complex terms with sparse context (F1=0.83 for synonym class). The current architecture generalizes effectively within Turkish but would need retraining or adaptation for other languages. The extensive use of LLM-generated labels risks systematic biases, such as overgeneration of co-hyponyms in specific domains. False negatives on synonyms often involve morphological variants or under-represented multi-word expressions; some antonyms with intensification semantics are misclassified absent explicit negation cues.

Potential enhancements include incorporation of margin-based contrastive objectives (e.g., triplet loss), post-hoc hard negative mining, and cross-lingual transfer learning strategies. A plausible implication is that further improvements may be achievable via active selection of challenging negative samples and multilingual alignment (Tosun et al., 19 Jan 2026).

8. Broader Context: Relation to General Lexical-Semantic Classification

While the three-way semantic relation discriminator is designed for distinguishing synonym, antonym, and co-hyponym relations in Turkish, its architecture and methodology bear significant relation to the broader lexical-semantic classification literature. Notably, the equivalence of related tasks—Word-in-Context (WiC), Target Sense Verification (TSV), and Word Sense Disambiguation (WSD)—has been formally established. These tasks are proven reducible to each other in polynomial time under the sense–meaning hypothesis, both theoretically and empirically (Hauer et al., 2021). This suggests resources and modeling solutions developed for one semantic classification task may, under suitable formal correspondences, be applied or adapted to related problems, including the multi-way discriminative model at the core of large-scale lexical graph construction.

Markdown Report Issue Upgrade to Chat

References (2)

Beyond Cosine Similarity: Taming Semantic Drift and Antonym Intrusion in a 15-Million Node Turkish Synonym Graph (2026)

WiC = TSV = WSD: On the Equivalence of Three Semantic Tasks (2021)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Three-Way Semantic Relation Discriminator.