Efficient and scalable learning of variable-length semantic IDs

Develop learning methods for variable-length semantic identifiers in recommender systems that are both efficient in representation and scalable to large-scale recommendation problems, ensuring practicality at industrial catalog sizes.

Background

Prior work in recommender systems predominantly learns fixed-length semantic identifiers using vector-quantization-based methods such as RQ-VAE or R-KMeans. In contrast, emergent communication research has explored variable-length messages and length penalties but largely at small scales and often with REINFORCE training rather than a full variational framework.

This divergence has left a gap: while variable-length encodings are conceptually appealing and potentially more efficient, there has been no established approach demonstrating how to learn such identifiers in a way that remains both efficient and scalable for large-scale recommendation settings.

References

As a result, it remains unclear how to learn variable-length semantic identifiers that are both efficient and scalable to large-scale recommendation problems.

Variable-Length Semantic IDs for Recommender Systems  (2602.16375 - Khrylchenko, 18 Feb 2026) in Related Work, concluding paragraph (end of section)