Is Cosine-Similarity of Embeddings Really About Similarity?

Published 8 Mar 2024 in cs.IR and cs.LG | (2403.05440v1)

Abstract: Cosine-similarity is the cosine of the angle between two vectors, or equivalently the dot product between their normalizations. A popular application is to quantify semantic similarity between high-dimensional objects by applying cosine-similarity to a learned low-dimensional feature embedding. This can work better but sometimes also worse than the unnormalized dot-product between embedded vectors in practice. To gain insight into this empirical observation, we study embeddings derived from regularized linear models, where closed-form solutions facilitate analytical insights. We derive analytically how cosine-similarity can yield arbitrary and therefore meaningless `similarities.' For some linear models the similarities are not even unique, while for others they are implicitly controlled by the regularization. We discuss implications beyond linear models: a combination of different regularizations are employed when learning deep models; these have implicit and unintended effects when taking cosine-similarities of the resulting embeddings, rendering results opaque and possibly arbitrary. Based on these insights, we caution against blindly using cosine-similarity and outline alternatives.

Abstract PDF HTML Upgrade to Chat

References (8)

Citations (41)

View on Semantic Scholar

Summary

The paper shows that cosine similarity can yield arbitrary results due to inherent degrees of freedom in learned embeddings.
It uses matrix factorization models and examines regularization impacts to clarify how training choices affect similarity outcomes.
The study proposes remedies such as direct alignment in training and normalization techniques to improve similarity measure reliability.

Exploring the Ambiguities of Cosine Similarity in Embedding Spaces

Introduction to the Dilemma of Cosine Similarity

Cosine similarity, a popular metric for quantifying semantic similarity between high-dimensional objects, has shown inconsistent performance when applied to learned low-dimensional feature embeddings. This inconsistency raises questions about the reliability of cosine similarity as a measure of 'similarity' between embedded vectors. Through an analytical exploration of embeddings derived from regularized linear models, this paper uncovers that cosine similarity can yield arbitrary similarities, posing both theoretical and practical implications in its application within various domains.

Theoretical Insights from Matrix Factorization Models

The investigation begins with matrix factorization (MF) models, which allow for analytical understanding due to their capacity for closed-form solutions. The study reveals that the application of cosine similarity to learned embeddings can lead to arbitrary results. This ambiguity is attributed not to cosine similarity itself but to a degree of freedom in learned embeddings, where different regularization approaches during training have distinct impacts. Notably, specific regularization schemes allow for arbitrary rescalings of embeddings, subsequently affecting cosine similarity outcomes. This finding is crucial as it challenges the assumption of the norm's irrelevance in evaluating directional alignment between embedding vectors.

Analytical Derivations and Implications

The analytical exploration extends to delineating the effect of two commonly used regularization schemes on the uniqueness and arbitrariness of cosine similarities. The distinction between these regularization schemes provides a deeper understanding of how they implicitly control the learned embeddings and, by extension, the resulting cosine similarities. The implications of these findings extend beyond linear models to deep learning models, where a combination of various regularization methods is employed, potentially complicating the interpretation of cosine similarities further.

Proposed Remedies and Future Directions

Acknowledging the limitations and potential misleading outcomes associated with cosine similarity, the paper proposes remedies and alternative approaches. One notable proposal is to align the training objective with cosine similarity directly or to reconsider the embedding space to avoid the issues highlighted. Additionally, the incorporation of normalization or bias-reduction techniques during training is suggested as a means to potentially enhance the semantic similarity measure's effectiveness.

Experimentation on Simulated Data

Complementing the theoretical analysis, the paper presents an empirical examination using simulated data, illustrating the significant variability in cosine similarities due to different modeling choices. This experimental validation underscores the theoretical claims and emphasizes the need for cautious application and interpretation of cosine similarity in practice.

Concluding Remarks on the Use of Cosine Similarity

This investigation into the reliability and consistency of using cosine similarity for measuring semantic similarity in embeddings presents compelling evidence of its potential pitfalls. By highlighting the theoretical underpinnings and practical implications, this paper calls for a reconsidered approach to employing cosine similarity, advocating for awareness and adaptation to mitigate its limitations. As the field of AI and machine learning continues to evolve, revisiting and refining fundamental measures such as cosine similarity will be essential in advancing reliable and interpretable models.

In reflection of these findings, the research community is encouraged to explore alternative methods and adapt current practices, fostering advancements in the development of more robust and meaningful similarity metrics. Further research into the application of cosine similarity in deep models is also warranted, given the added complexity and potential for opacity in these systems.