Optimality of diagonal scaling for cosine similarity in regularized matrix factorization
Determine whether the unique diagonal scaling matrix diag(√(σ_i (1 − λ/σ_i)_+)) over the top-k singular values σ_i of X, which arises in the closed-form solution to the regularized matrix factorization objective min_{A,B} ||X − XAB^T||_F^2 + λ(||XA||_F^2 + ||B||_F^2), yields the best possible semantic similarities in practice when cosine similarity is applied to the resulting user and item embeddings.
References
While this solution is unique, it remains an open question if this unique diagonal matrix $(...,\sqrt{\sigma_i\cdot (1-\frac{\lambda}{\sigma_i})_+} ,...)_k $ regarding the user and item embeddings yields the best possible semantic similarities in practice.
— Is Cosine-Similarity of Embeddings Really About Similarity?
(2403.05440 - Steck et al., 2024) in Section 2.3, Details on Second Objective