Papers
Topics
Authors
Recent
Search
2000 character limit reached

Active Few-Shot Fine-Tuning

Published 13 Feb 2024 in cs.LG and cs.AI | (2402.15441v4)

Abstract: We study the question: How can we select the right data for fine-tuning to a specific task? We call this data selection problem active fine-tuning and show that it is an instance of transductive active learning, a novel generalization of classical active learning. We propose ITL, short for information-based transductive learning, an approach which samples adaptively to maximize information gained about the specified task. We are the first to show, under general regularity assumptions, that such decision rules converge uniformly to the smallest possible uncertainty obtainable from the accessible data. We apply ITL to the few-shot fine-tuning of large neural networks and show that fine-tuning with ITL learns the task with significantly fewer examples than the state-of-the-art.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (103)
  1. Yasin Abbasi-Yadkori. Online learning for linearly parametrized control problems. PhD thesis, University of Alberta, 2013.
  2. Adapting the linearised laplace model evidence for modern deep learning. In International Conference on Machine Learning, pp.  796–821. PMLR, 2022.
  3. On exact computation with an infinitely wide neural net. NeurIPS, 32, 2019.
  4. k-means++: The advantages of careful seeding. In SODA, volume 7, 2007.
  5. Deep batch active learning by diverse, uncertain gradient lower bounds. ICLR, 2020.
  6. Best arm identification in multi-armed bandits. In COLT, 2010.
  7. Ann-benchmarks: A benchmarking tool for approximate nearest neighbor algorithms. Information Systems, 87, 2020.
  8. A cookbook of self-supervised learning. arXiv preprint arXiv:2304.12210, 2023.
  9. Curriculum learning. In ICML, volume 26, 2009.
  10. Feature selection via mutual information: New theoretical insights. In IJCNN, 2019.
  11. Weight uncertainty in neural network. In ICML, 2015.
  12. Truncated variance reduction: A unified approach to bayesian optimization and level-set estimation. NeurIPS, 29, 2016.
  13. Language models are few-shot learners. NeurIPS, 33, 2020.
  14. Pure exploration in multi-armed bandits problems. In ALT, volume 20, 2009.
  15. Bayesian experimental design: A review. Statistical Science, 1995.
  16. Near-optimal batch mode active learning and adaptive submodular optimization. In ICML, 2013.
  17. On kernelized multi-armed bandits. In ICML, 2017.
  18. Similarity search for efficient active learning and search of rare concepts. In AAAI, volume 36, 2022.
  19. Thomas M Cover. Elements of information theory. John Wiley & Sons, 1999.
  20. Approximate submodularity and its applications: Subset selection, sparse approximation and dictionary selection. JMLR, 19(1), 2018.
  21. Laplace redux-effortless bayesian deep learning. NeurIPS, 34, 2021.
  22. Imagenet: A large-scale hierarchical image database. In CVPR, 2009.
  23. BERT: Pre-training of deep bidirectional transformers for language understanding. In NAACL, 2019.
  24. Efficient exploration for llms. arXiv preprint arXiv:2402.00396, 2024.
  25. Likelihood ratio confidence sets for sequential decision making. NeurIPS, 37, 2023.
  26. Adaptivity in adaptive submodularity. In COLT, 2021.
  27. Sequential experimental design for transductive linear bandits. NeurIPS, 32, 2019.
  28. Consistency-based semi-supervised active learning: Towards minimizing labeling cost. In ECCV, 2020.
  29. Deep active learning over the long tail. arXiv preprint arXiv:1711.00941, 2017.
  30. Adaptive submodularity: Theory and applications in active learning and stochastic optimization. JAIR, 42, 2011.
  31. Semi-supervised learning by entropy minimization. NeurIPS, 17, 2004.
  32. Automated curriculum learning for neural networks. In ICML, 2017.
  33. Franklin A Graybill. An introduction to linear statistical models. Literary Licensing, LLC, 1961.
  34. Accelerating large-scale inference with anisotropic vector quantization. In ICML, 2020.
  35. Optimistic active-learning using mutual information. In IJCAI, volume 7, 2007.
  36. Bayesian deep ensembles via the neural tangent kernel. NeurIPS, 33, 2020.
  37. Benchmarking neural network robustness to common corruptions and perturbations. ICLR, 2019.
  38. A baseline for detecting misclassified and out-of-distribution examples in neural networks. ICLR, 2017.
  39. Entropy search for information-efficient global optimization. JMLR, 13(6), 2012.
  40. Predictive entropy search for efficient global optimization of black-box functions. NeurIPS, 27, 2014.
  41. Output-space predictive entropy search for flexible global optimization. In NeurIPS workshop on Bayesian Optimization, 2015.
  42. A framework and benchmark for deep batch active learning for regression. JMLR, 24(164), 2023.
  43. John J Hopfield. Neural networks and physical systems with emergent collective computational abilities. Proceedings of the national academy of sciences, 79(8), 1982.
  44. Bayesian active learning for classification and preference learning. CoRR, 2011.
  45. Universal language model fine-tuning for text classification. In ACL, 2018.
  46. Information-based transductive active learning. arXiv preprint arXiv:2402.15898, 2024.
  47. Neural tangent kernel: Convergence and generalization in neural networks. NeurIPS, 31, 2018.
  48. Billion-scale similarity search with gpus. IEEE Transactions on Big Data, 7(3), 2019.
  49. Probabilistic active meta-learning. NeurIPS, 33, 2020.
  50. Neural contextual bandits without regret. In AISTATS, 2022.
  51. Approximate inference turns deep networks into gaussian processes. NeurIPS, 32, 2019.
  52. Scalable greedy feature selection via weak submodularity. In AISTATS, 2017.
  53. Adam: A method for stochastic optimization. In ICLR, 2014.
  54. Batchbald: Efficient and diverse batch acquisition for deep bayesian active learning. NeurIPS, 32, 2019.
  55. Wilds: A benchmark of in-the-wild distribution shifts. In ICML, 2021.
  56. Do better imagenet models transfer better? In CVPR, 2019.
  57. Submodular function maximization. Tractability, 3, 2014.
  58. Nonmyopic active learning of gaussian processes: an exploration-exploitation approach. In ICML, volume 24, 2007.
  59. Near-optimal sensor placements in gaussian processes: Theory, efficient algorithms and empirical studies. JMLR, 9(2), 2008.
  60. Learning multiple layers of features from tiny images. Technical report, University of Toronto, 2009.
  61. The mnist database of handwritten digits. http://yann.lecun.com/exdb/mnist/, 1998.
  62. Deep neural networks as gaussian processes. ICLR, 2018.
  63. Wide neural networks of any depth evolve as linear models under gradient descent. NeurIPS, 32, 2019.
  64. Surgical fine-tuning improves adaptation to distribution shifts. NeurIPS workshop on Distribution Shifts, 2022.
  65. D Lewis and W Gale. A sequential algorithmfor training text classifiers. In SIGIR, 1994.
  66. Heterogeneous uncertainty sampling for supervised learning. In Machine learning proceedings 1994. Elsevier, 1994.
  67. Retrieval-augmented generation for knowledge-intensive nlp tasks. NeurIPS, 33, 2020.
  68. Sampling from gaussian process posteriors using stochastic gradient descent. NeurIPS, 37, 2023.
  69. A simple baseline for bayesian uncertainty in deep learning. NeurIPS, 32, 2019.
  70. A kernel-based view of language model fine-tuning. In ICML, 2023.
  71. Optimizing neural networks with kronecker-factored approximate curvature. In ICML, 2015.
  72. Experimental design for linear functionals in reproducing kernel hilbert spaces. NeurIPS, 35, 2022.
  73. An analysis of approximations for maximizing submodular set functions—i. Mathematical programming, 14, 1978.
  74. The effectiveness of lloyd-type methods for the k-means problem. JACM, 2013.
  75. Experiment planning with function approximation. NeurIPS, 37, 2024.
  76. Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy. IEEE Transactions on pattern analysis and machine intelligence, 27(8), 2005.
  77. Random features for large-scale kernel machines. NeurIPS, 20, 2007.
  78. Do imagenet classifiers generalize to imagenet? In ICML, 2019.
  79. Meta-learning priors for safe bayesian optimization. In COLT, 2023.
  80. A tutorial on thompson sampling. Foundations and Trends® in Machine Learning, 11(1), 2018.
  81. Active hidden markov models for information extraction. In IDA, 2001.
  82. Active learning for convolutional neural networks: A core-set approach. ICLR, 2017.
  83. Burr Settles. Active learning literature survey. Technical report, University of Wisconsin-Madison Department of Computer Sciences, 2009.
  84. An analysis of active learning strategies for sequence labeling tasks. In EMNLP, 2008.
  85. Partial is better than all: revisiting fine-tuning strategy for few-shot learning. In AAAI, volume 35, 2021.
  86. Experimental design for overparameterized learning with application to single shot deep active learning. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023.
  87. To compress or not to compress–self-supervised learning and information theory: A review. arXiv preprint arXiv:2304.09355, 2023.
  88. Towards foundation models and few-shot parameter-efficient fine-tuning for volumetric organ segmentation. In MICCAI, 2023.
  89. Curriculum learning: A survey. IJCV, 2022.
  90. Gaussian process optimization in the bandit setting: No regret and experimental design. In ICML, volume 27, 2009.
  91. Active learning helps pretrained models learn the intended task. NeurIPS, 35, 2022.
  92. Efficientnet: Rethinking model scaling for convolutional neural networks. In ICML, 2019.
  93. On information gain and regret bounds in gaussian process bandits. In AISTATS, 2021.
  94. Vladimir Vapnik. Estimation of dependences based on empirical data. Springer Science & Business Media, 2006.
  95. A review of feature selection methods based on mutual information. Neural computing and applications, 24, 2014.
  96. Matching networks for one shot learning. NeurIPS, 29, 2016.
  97. Martin J Wainwright. High-dimensional statistics: A non-asymptotic viewpoint, volume 48. Cambridge university press, 2019.
  98. Zi Wang and Stefanie Jegelka. Max-value entropy search for efficient bayesian optimization. In ICML, 2017.
  99. More than a toy: Random matrix models predict how real-world neural representations generalize. In ICML, 2022.
  100. Gaussian processes for machine learning, volume 2. MIT press Cambridge, MA, 2006.
  101. Passive sampling for regression. In ICDM, 2010.
  102. Active learning via transductive experimental design. In ICML, volume 23, 2006.
  103. Design of experiments for stochastic contextual linear bandits. NeurIPS, 34, 2021.
Citations (1)

Summary

No one has generated a summary of this paper yet.

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 3 tweets with 5 likes about this paper.