Identify the most effective and efficient embedding scaling strategy across regimes
Ascertain which embedding scaling strategy—structural expansion via Per-Layer Embedding that allocates independent embedding parameters to each layer, or vocabulary expansion via N-gram Embedding using hashed n-gram lookup tables—achieves superior effectiveness and efficiency under different scaling regimes of large language models.
References
Third, while some methods for scaling embeddings have been proposed, it is still unclear which scaling strategy is more effective and efficient under different regimes.
— Scaling Embeddings Outperforms Scaling Experts in Language Models
(2601.21204 - Liu et al., 29 Jan 2026) in Section 1 Introduction