Linguistically Conditioned Semantic Textual Similarity

Published 6 Jun 2024 in cs.CL and cs.AI | (2406.03673v1)

Abstract: Semantic textual similarity (STS) is a fundamental NLP task that measures the semantic similarity between a pair of sentences. In order to reduce the inherent ambiguity posed from the sentences, a recent work called Conditional STS (C-STS) has been proposed to measure the sentences' similarity conditioned on a certain aspect. Despite the popularity of C-STS, we find that the current C-STS dataset suffers from various issues that could impede proper evaluation on this task. In this paper, we reannotate the C-STS validation set and observe an annotator discrepancy on 55% of the instances resulting from the annotation errors in the original label, ill-defined conditions, and the lack of clarity in the task definition. After a thorough dataset analysis, we improve the C-STS task by leveraging the models' capability to understand the conditions under a QA task setting. With the generated answers, we present an automatic error identification pipeline that is able to identify annotation errors from the C-STS data with over 80% F1 score. We also propose a new method that largely improves the performance over baselines on the C-STS data by training the models with the answers. Finally we discuss the conditionality annotation based on the typed-feature structure (TFS) of entity types. We show in examples that the TFS is able to provide a linguistic foundation for constructing C-STS data with new conditions.

Abstract PDF HTML Upgrade to Chat

Summary

The paper introduces a QA-based framework to reannotate Conditional STS datasets, addressing annotation errors.
It adapts a typed-feature structure for robust linguistic condition definitions, improving task clarity.
Results achieve over 80% F1 in error detection and demonstrate marked improvements in model performance compared to baselines.

The paper "Linguistically Conditioned Semantic Textual Similarity" addresses the Semantic Textual Similarity (STS) task, which evaluates how semantically similar two sentences are. Recognizing that existing measures can be ambiguous, the authors explore Conditional STS (C-STS), which assesses similarity conditioned on specific aspects. They identify numerous issues with the current C-STS dataset, such as annotation errors, ill-defined conditions, and ambiguous task definitions.

The authors undertake a reannotation of the C-STS validation set, revealing an annotator discrepancy in 55% of cases. These discrepancies stem from errors in the original labels, unclear condition definitions, and an overall lack of task clarity. To address these issues, they propose a novel approach by adapting the task to a Question-Answering (QA) framework.

The new methodology involves generating answers based on the conditions, which then helps in training models. This QA framework facilitates an automatic error identification pipeline, achieving over 80% F1 score in identifying annotation errors. This significant improvement underscores the effectiveness of their approach in refining the evaluation process for C-STS.

Furthermore, the paper introduces a new training method that leverages the generated answers, leading to marked enhancements in model performance when compared to existing baselines. The authors also explore conditionality annotation using the typed-feature structure (TFS) of entity types. They demonstrate through examples that TFS can provide a robust linguistic foundation for defining conditions in C-STS data.

In summary, the paper not only highlights critical issues in the current C-STS datasets but also offers innovative solutions to improve dataset quality and model performance. The integration of QA task settings and TFS-based annotations represents a significant advancement in the STS domain.