Papers
Topics
Authors
Recent
Search
2000 character limit reached

Magnificent Minified Models

Published 16 Jun 2023 in cs.LG | (2306.10177v1)

Abstract: This paper concerns itself with the task of taking a large trained neural network and 'compressing' it to be smaller by deleting parameters or entire neurons, with minimal decreases in the resulting model accuracy. We compare various methods of parameter and neuron selection: dropout-based neuron damage estimation, neuron merging, absolute-value based selection, random selection, OBD (Optimal Brain Damage). We also compare a variation on the classic OBD method that slightly outperformed all other parameter and neuron selection methods in our tests with substantial pruning, which we call OBD-SD. We compare these methods against quantization of parameters. We also compare these techniques (all applied to a trained neural network), with neural networks trained from scratch (random weight initialization) on various pruned architectures. Our results are only barely consistent with the Lottery Ticket Hypothesis, in that fine-tuning a parameter-pruned model does slightly better than retraining a similarly pruned model from scratch with randomly initialized weights. For neuron-level pruning, retraining from scratch did much better in our experiments.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (10)
  1. Xin Dong, Shangyu Chen and Sinno Jialin Pan “Learning to Prune Deep Neural Networks via Layer-wise Optimal Brain Surgeon” In CoRR abs/1705.07565, 2017 arXiv: http://arxiv.org/abs/1705.07565
  2. “The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks”, 2019 arXiv:1803.03635 [cs.LG]
  3. “Learning both Weights and Connections for Efficient Neural Networks” In CoRR abs/1506.02626, 2015 arXiv: http://arxiv.org/abs/1506.02626
  4. Babak Hassibi, David G. Stork and Gregory Wolff “Optimal Brain Surgeon: Extensions and performance comparisons” In Advances in Neural Information Processing Systems 6 Morgan-Kaufmann, 1994, pp. 263–270 URL: http://papers.nips.cc/paper/749-optimal-brain-surgeon-extensions-and-performance-comparisons.pdf
  5. Yann LeCun, John Denker and Sara Solla “Optimal Brain Damage” In Advances in Neural Information Processing Systems 2 Morgan-Kaufmann, 1990, pp. 598–605 URL: https://proceedings.neurips.cc/paper/1989/file/6c9882bbac1c7093bd25041881277658-Paper.pdf
  6. “Rethinking the Value of Network Pruning”, 2019 arXiv:1810.05270 [cs.LG]
  7. “ALOHA: Auxiliary Loss Optimization for Hypothesis Augmentation” In CoRR abs/1903.05700, 2019 arXiv: http://arxiv.org/abs/1903.05700
  8. “EDropout: Energy-Based Dropout and Pruning of Deep Neural Networks”, 2020 arXiv:2006.04270 [cs.LG]
  9. Volker Tresp, Ralph Neuneier and Hans-Georg Zimmermann “Early Brain Damage” In Advances in Neural Information Processing Systems 9 MIT Press, 1997, pp. 669–675 URL: https://proceedings.neurips.cc/paper/1996/file/2ac2406e835bd49c70469acae337d292-Paper.pdf
  10. “Merging Similar Neurons for Deep Networks Compression” In Cognitive Computation 12, 2020 DOI: 10.1007/s12559-019-09703-6

Summary

No one has generated a summary of this paper yet.

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Authors (2)

Collections

Sign up for free to add this paper to one or more collections.