Explain why alignment loss outperforms ranking-based objectives
Establish whether the observed superiority of pointwise cosine alignment over ranking-based distillation losses in training NanoVDR text-only students distilled from the Qwen3-VL-Embedding-2B teacher arises from the high quality of the teacher’s embedding space, specifically that well-structured teacher coordinates enable direct spatial alignment to capture richer geometric information than relative ranking losses.
References
We conjecture that alignment's advantage stems from the high quality of our teacher's embedding space: when the teacher provides well-structured coordinates, direct spatial alignment exploits richer geometric signal than relative ranking alone.
— NanoVDR: Distilling a 2B Vision-Language Retriever into a 70M Text-Only Encoder for Visual Document Retrieval
(2603.12824 - Liu et al., 13 Mar 2026) in Section 6.1 (The Monotonic Superiority of Spatial Alignment)