Is Cosine-Similarity of Embeddings Really About Similarity?
Abstract: Cosine-similarity is the cosine of the angle between two vectors, or equivalently the dot product between their normalizations. A popular application is to quantify semantic similarity between high-dimensional objects by applying cosine-similarity to a learned low-dimensional feature embedding. This can work better but sometimes also worse than the unnormalized dot-product between embedded vectors in practice. To gain insight into this empirical observation, we study embeddings derived from regularized linear models, where closed-form solutions facilitate analytical insights. We derive analytically how cosine-similarity can yield arbitrary and therefore meaningless `similarities.' For some linear models the similarities are not even unique, while for others they are implicitly controlled by the regularization. We discuss implications beyond linear models: a combination of different regularizations are employed when learning deep models; these have implicit and unintended effects when taking cosine-similarities of the resulting embeddings, rendering results opaque and possibly arbitrary. Based on these insights, we caution against blindly using cosine-similarity and outline alternatives.
- Layer normalization, 2016. arXiv:1607.06450.
- Towards a better understanding of linear models for recommendation. In ACM Conference on Knowledge Discovery and Data Mining (KDD), 2021.
- Dense passage retrieval for open-domain question answering, 2020. arXiv:2004.04906v3.
- O. Khattab and M. Zaharia. ColBERT: Efficient and effective passage search via contextualized late interaction over BERT, 2020. arXiv:2004.12832v2.
- Efficient estimation of word representations in vector space, 2013.
- H. Steck. Autoencoders that don’t overfit towards the identity. In Advances in Neural Information Processing Systems (NeurIPS), 2020.
- Regularized singular value decomposition and application to recommender system, 2018. arXiv:1804.05090.
- Problems with cosine as a measure of embedding similarity for high frequency words. In 60th Annual Meeting of the Association for Computational Linguistics, 2022.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.