Optimality of using the backbone’s native hidden size as embedding dimension
Determine whether selecting the transformer backbone’s native hidden size as the embedding dimension is optimal for bi-encoder dense retrieval models that encode queries and documents into single vectors and score via inner products, and quantify how retrieval performance changes when the embedding dimension is expanded beyond or compressed below the backbone’s hidden size.
References
Despite the impact of embedding dimension on efficiency, practitioners often rely on the ``native'' hidden size of a transformer backbone (e.g., 768 for BERT-base ), though it is unclear whether this choice is optimal or how performance is impacted as dimensions are expanded or compressed.
— Scaling Laws for Embedding Dimension in Information Retrieval
(2602.05062 - Killingback et al., 4 Feb 2026) in Introduction (Section 1)