Whether prior scaling-law studies used the optimal width–depth relationship

Ascertain whether the Chinchilla (Hoffmann et al., 2022) and GPT-3 (Brown et al., 2020) scaling studies empirically scanned both width and depth and used the optimal width–depth relationship under a fixed parameter budget when reporting their results.

Background

Based on fitted exponents near one for both width and depth, the paper argues that the optimal width–depth relationship under fixed parameters should scale approximately linearly, yet empirical model families show departures from this trend at smaller scales.

This leads the authors to question whether influential prior studies systematically scanned width and depth to identify and use the optimal trade-off when establishing their scaling curves.

References

We are not sure if previous works scanned depth and width and used the optimal relationship.

— Inverse Depth Scaling From Most Layers Being Similar (2602.05970 - Liu et al., 5 Feb 2026) in Appendix A (LLM Scaling Laws), Additional Results

Whether prior scaling-law studies used the optimal width–depth relationship

Background

References

Related Problems