Whether prior scaling-law studies used the optimal width–depth relationship
Ascertain whether the Chinchilla (Hoffmann et al., 2022) and GPT-3 (Brown et al., 2020) scaling studies empirically scanned both width and depth and used the optimal width–depth relationship under a fixed parameter budget when reporting their results.
References
We are not sure if previous works scanned depth and width and used the optimal relationship.
— Inverse Depth Scaling From Most Layers Being Similar
(2602.05970 - Liu et al., 5 Feb 2026) in Appendix A (LLM Scaling Laws), Additional Results