Cosine-Similarity Methods for Efficient Training and Sampling in High-Dimensional Latent Spaces

Published 30 Nov 2025 in math.OC | (2512.00684v1)

Abstract: Latent generative models are increasingly shifting from traditional VAEs toward representation autoencoders and semantically aligned latent spaces, which lift images into higher-dimensional feature domains where semantic factors become more separable. Yet these spaces also contain geometric regularities that existing methods do not fully exploit--particularly in the directional relationships between features. We introduce a cosine-similarity-based mechanism that improves both training and sampling by selecting couplings that produce cleaner, less entangled velocity fields. This simple alignment reduces gradient noise, accelerates convergence, and improves sample fidelity. Building on this idea, we develop cosine-similarity-based fine-tuning and time-scheduling strategies that reduce the FID of an 800-epoch RAE from 11.99 to 8.60. Furthermore, by formulating an optimal-transport coupling using a cosine cost, a single-epoch fine-tuning step at the 20-epoch checkpoint reaches 3.30 FID-matching the performance of the 80-epoch baseline.