Manifold Metric: A Loss Landscape Approach for Predicting Model Performance
Abstract: Determining the optimal model for a given task often requires training multiple models from scratch, which becomes impractical as dataset and model sizes grow. A more efficient alternative is to expand smaller pre-trained models, but this approach is underutilized due to a limited understanding of its impact on the training dynamics. Existing methods for quantifying this impact have notable limitations, including computation cost. To address this, we introduce a new perspective based on the loss landscape, which has been shown to contain a manifold of linearly connected minima. Specifically, we propose a metric that estimates the size of this manifold to study the impact of model expansion. Our experiments reveal a strong correlation between performance gains and our manifold metric, enabling more informed model comparison and offering a first step toward a geometry-driven approach for reliable model expansion. Notably, our metric outperforms other baselines, even when different types of expansion with equivalent number of parameters are applied to a model.
- Zero-cost proxies for lightweight nas. arXiv preprint arXiv:2101.08134, 2021.
- Git re-basin: Merging models modulo permutation symmetries. arXiv preprint arXiv:2209.04836, 2022.
- Low-loss connection of weight vectors: distribution-based approaches. In International Conference on Machine Learning, pages 335–344. PMLR, 2020.
- Loss surface simplexes for mode connecting volumes and fast ensembling. In Marina Meila and Tong Zhang, editors, Proceedings of the 38th International Conference on Machine Learning, volume 139 of Proceedings of Machine Learning Research, pages 769–779. PMLR, 18–24 Jul 2021. URL https://proceedings.mlr.press/v139/benton21a.html.
- JAX: composable transformations of Python+NumPy programs, 2018. URL http://github.com/google/jax.
- Net2net: Accelerating learning via knowledge transfer. In The Fourth International Conference on Learning Representations, 2016. URL http://arxiv.org/abs/1511.05641.
- GradNorm: Gradient normalization for adaptive loss balancing in deep multitask networks. In Jennifer Dy and Andreas Krause, editors, Proceedings of the 35th International Conference on Machine Learning, volume 80 of Proceedings of Machine Learning Research, pages 794–803. PMLR, 10–15 Jul 2018. URL https://proceedings.mlr.press/v80/chen18a.html.
- Yaim Cooper. The loss landscape of overparameterized neural networks. arXiv preprint arXiv:1804.10200, 2018.
- BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 4171–4186, Minneapolis, Minnesota, June 2019. Association for Computational Linguistics. doi: 10.18653/v1/N19-1423. URL https://aclanthology.org/N19-1423.
- Dytox: Transformers for continual learning with dynamic token expansion. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 9285–9295, 2022.
- The role of permutation invariance in linear mode connectivity of neural networks. arXiv preprint arXiv:2110.06296, 2021.
- The cascade-correlation learning architecture. In D. Touretzky, editor, Advances in Neural Information Processing Systems, volume 2. Morgan-Kaufmann, 1989. URL https://proceedings.neurips.cc/paper_files/paper/1989/file/69adc1e107f7f7d035d7baf04342e1ca-Paper.pdf.
- Sharpness-aware minimization for efficiently improving generalization. In International Conference on Learning Representations, 2021.
- Deep learning versus kernel learning: an empirical study of loss landscape geometry and the time evolution of the neural tangent kernel. Advances in Neural Information Processing Systems, 33:5850–5861, 2020.
- Linear mode connectivity and the lottery ticket hypothesis. In International Conference on Machine Learning, pages 3259–3269. PMLR, 2020.
- Loss surfaces, mode connectivity, and fast ensembling of dnns. Advances in neural information processing systems, 31, 2018.
- Composable function-preserving expansions for transformer architectures. arXiv preprint arXiv:2308.06103, 2023.
- Efficient training of BERT by progressively stacking. In Kamalika Chaudhuri and Ruslan Salakhutdinov, editors, Proceedings of the 36th International Conference on Machine Learning, volume 97 of Proceedings of Machine Learning Research, pages 2337–2346. PMLR, 09–15 Jun 2019. URL https://proceedings.mlr.press/v97/gong19a.html.
- On the transformer growth for progressive BERT training. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 5174–5180, Online, June 2021. Association for Computational Linguistics. doi: 10.18653/v1/2021.naacl-main.406. URL https://aclanthology.org/2021.naacl-main.406.
- Knowledge transfer in deep convolutional neural nets. Int. J. Artif. Intell. Tools, 17(3):555–567, 2008.
- Deep residual learning for image recognition. In IEEE Conference on Computer Vision and Pattern Recognition, 2016.
- Scaling laws for neural language models. arXiv preprint arXiv:2001.08361, 2020.
- On the maximum hessian eigenvalue and generalization. In Proceedings on, pages 51–65. PMLR, 2023.
- Explaining landscape connectivity of low-cost solutions for multilayer nets. Advances in neural information processing systems, 32, 2019.
- Loss landscapes of regularized linear autoencoders. In International conference on machine learning, pages 3560–3569. PMLR, 2019.
- Snip: Single-shot network pruning based on connection sensitivity. arXiv preprint arXiv:1810.02340, 2018.
- Visualizing the loss landscape of neural nets. Advances in neural information processing systems, 31, 2018.
- Understanding and combating robust overfitting via input loss landscape analysis and regularization. Pattern Recognition, 136:109229, 2023.
- Learn to grow: A continual structure learning framework for overcoming catastrophic forgetting. arXiv preprint arXiv:1904.00310, 2019.
- Loss landscapes and optimization in over-parameterized non-linear systems and neural networks. Applied and Computational Harmonic Analysis, 59:85–116, 2022.
- Neural architecture search without training. In International Conference on Machine Learning, pages 7588–7598. PMLR, 2021.
- Pointer sentinel mixture models. arXiv preprint arXiv:1609.07843, 2016.
- Re-basin via implicit sinkhorn differentiation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 20237–20246, 2023.
- Llukan Puka. Kendall’s Tau, pages 713–715. Springer Berlin Heidelberg, Berlin, Heidelberg, 2011. ISBN 978-3-642-04898-2. doi: 10.1007/978-3-642-04898-2_324. URL https://doi.org/10.1007/978-3-642-04898-2_324.
- Exploring mode connectivity for pre-trained language models. arXiv preprint arXiv:2210.14102, 2022.
- A comprehensive survey of neural architecture search: Challenges and solutions. ACM Computing Surveys (CSUR), 54(4):1–34, 2021.
- Revisiting the train loss: an efficient performance estimator for neural architecture search. stat, 1050:8, 2020.
- Speedy performance estimation for neural architecture search. Advances in Neural Information Processing Systems, 34:4079–4092, 2021.
- Proxybo: Accelerating neural architecture search via bayesian optimization with zero-cost proxies. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 37, pages 9792–9801, 2023.
- Nas-bench-301 and the case for surrogate benchmarks for neural architecture search. arXiv preprint arXiv:2008.09777, 4:14, 2020.
- Geometry of the loss landscape in overparameterized neural networks: Symmetries and invariances. In International Conference on Machine Learning, pages 9722–9732. PMLR, 2021.
- Bayesian optimization with robust bayesian neural networks. Advances in neural information processing systems, 29, 2016.
- Ruoyu Sun. Optimization for deep learning: theory and algorithms. arXiv preprint arXiv:1912.08957, 2019.
- Pruning neural networks without any data by iteratively conserving synaptic flow. Advances in neural information processing systems, 33:6377–6389, 2020.
- Attention is all you need. In Advances in Neural Information Processing Systems, 2017.
- Picking winning tickets before training by preserving gradient flow. arXiv preprint arXiv:2002.07376, 2020.
- Learning to grow pretrained models for efficient transformer training. In The Eleventh International Conference on Learning Representations, 2023. URL https://openreview.net/forum?id=cDYRS5iZ16f.
- Plateau in monotonic linear interpolation–a" biased" view of loss landscape for deep networks. arXiv preprint arXiv:2210.01019, 2022.
- Bananas: Bayesian optimization with neural architectures for neural architecture search. In Proceedings of the AAAI conference on artificial intelligence, volume 35, pages 10293–10301, 2021a.
- How powerful are performance predictors in neural architecture search? Advances in Neural Information Processing Systems, 34:28454–28469, 2021b.
- Neural architecture search: Insights from 1000 papers. arXiv preprint arXiv:2301.08727, 2023.
- Huggingface’s transformers: State-of-the-art natural language processing. arXiv preprint arXiv:1910.03771, 2019.
- How SGD selects the global minima in over-parameterized learning: A dynamical stability perspective. Advances in Neural Information Processing Systems, 31, 2018.
- Taxonomizing local versus global structure in neural network loss landscapes. Advances in Neural Information Processing Systems, 34:18722–18733, 2021.
- Are all layers created equal? Journal of Machine Learning Research, 23(67):1–28, 2022. URL http://jmlr.org/papers/v23/20-069.html.
- Understanding the initial condensation of convolutional neural networks. arXiv preprint arXiv:2305.09947, 2023.
- Neural architecture search with reinforcement learning. In International Conference on Learning Representations, 2017. URL https://openreview.net/forum?id=r1Ue8Hcxg.
- Learning transferable architectures for scalable image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 8697–8710, 2018.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.