Deriving retrieval-friendly embeddings from SID-token representations
Develop a principled method for deriving retrieval-friendly item embeddings from the Semantic Identifier (SID) token representations learned by the NEO-trained decoder-only language model so that approximate k-nearest neighbor retrieval achieves competitive performance, for example by investigating projection layers, pooling/whitening, contrastive fine-tuning, or mixed-model approaches.
References
We leave a more principled study of deriving retrieval-friendly embeddings from SID-token representations (e.g., projection layers, pooling/whitening, contrastive fine-tuning, or mixed models) to future work.
— A Unified Language Model for Large Scale Search, Recommendation, and Reasoning
(2603.17533 - Nadai et al., 18 Mar 2026) in Appendix, Unsuccessful attempts