Papers
Topics
Authors
Recent
Search
2000 character limit reached

FORML: A Riemannian Hessian-free Method for Meta-learning on Stiefel Manifolds

Published 28 Feb 2024 in cs.LG | (2402.18605v2)

Abstract: Meta-learning problem is usually formulated as a bi-level optimization in which the task-specific and the meta-parameters are updated in the inner and outer loops of optimization, respectively. However, performing the optimization in the Riemannian space, where the parameters and meta-parameters are located on Riemannian manifolds is computationally intensive. Unlike the Euclidean methods, the Riemannian backpropagation needs computing the second-order derivatives that include backward computations through the Riemannian operators such as retraction and orthogonal projection. This paper introduces a Hessian-free approach that uses a first-order approximation of derivatives on the Stiefel manifold. Our method significantly reduces the computational load and memory footprint. We show how using a Stiefel fully-connected layer that enforces orthogonality constraint on the parameters of the last classification layer as the head of the backbone network, strengthens the representation reuse of the gradient-based meta-learning methods. Our experimental results across various few-shot learning datasets, demonstrate the superiority of our proposed method compared to the state-of-the-art methods, especially MAML, its Euclidean counterpart.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (37)
  1. T. Hospedales, A. Antoniou, P. Micaelli, and A. Storkey, “Meta-learning in neural networks: A survey,” IEEE transactions on pattern analysis and machine intelligence, vol. 44, no. 9, pp. 5149–5169, 2021.
  2. J. Yuan and A. Lamperski, “Online adaptive principal component analysis and its extensions,” in International Conference on Machine Learning.   PMLR, 2019, pp. 7213–7221.
  3. A. Hyvärinen and E. Oja, “Independent component analysis: algorithms and applications,” Neural networks, vol. 13, no. 4-5, pp. 411–430, 2000.
  4. R. Chakraborty, L. Yang, S. Hauberg, and B. C. Vemuri, “Intrinsic grassmann averages for online linear, robust and nonlinear subspace learning,” IEEE transactions on pattern analysis and machine intelligence, vol. 43, no. 11, pp. 3904–3917, 2020.
  5. N. Bansal, X. Chen, and Z. Wang, “Can We Gain More from Orthogonality Regularizations in Training Deep CNNs?” arXiv preprint arXiv:1810.09102, 2018.
  6. M. Cogswell, F. Ahmed, R. Girshick, L. Zitnick, and D. Batra, “Reducing Overfitting in Deep Networks by Decorrelating Representations,” arXiv preprint arXiv:1511.06068, 2015.
  7. Z. Huang, J. Wu, and L. Van Gool, “Building Deep Networks on Grassmann Manifolds,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 32, no. 1, 2018.
  8. T. Lin and H. Zha, “Riemannian manifold learning,” IEEE transactions on pattern analysis and machine intelligence, vol. 30, no. 5, pp. 796–809, 2008.
  9. S. K. Roy, Z. Mhammedi, and M. Harandi, “Geometry aware constrained optimization techniques for deep learning,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 4460–4469.
  10. H. Tabealhojeh, P. Adibi, H. Karshenas, S. K. Roy, and M. Harandi, “Rmaml: Riemannian meta-learning with orthogonality constraints,” Pattern Recognition, vol. 140, p. 109563, 2023.
  11. C. Finn, P. Abbeel, and S. Levine, “Model-agnostic meta-learning for fast adaptation of deep networks,” in International conference on machine learning.   PMLR, 2017, pp. 1126–1135.
  12. A. Raghu, M. Raghu, S. Bengio, and O. Vinyals, “Rapid learning or feature reuse? towards understanding the effectiveness of maml,” in International Conference on Learning Representations, 2019.
  13. J. Oh, H. Yoo, C. Kim, and S. Yun, “Boil: Towards representation change for few-shot learning,” in The Ninth International Conference on Learning Representations (ICLR).   The International Conference on Learning Representations (ICLR), 2021.
  14. Z. Gao, Y. Wu, Y. Jia, and M. Harandi, “Learning to optimize on spd manifolds,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, pp. 7700–7709.
  15. Z. Gao, Y. Wu, X. Fan, M. Harandi, and Y. Jia, “Learning to optimize on riemannian manifolds,” IEEE Transactions on Pattern Analysis and Machine Intelligence, 2022.
  16. A. Nichol, J. Achiam, and J. Schulman, “On first-order meta-learning algorithms,” arXiv preprint arXiv:1803.02999, 2018.
  17. X. Fan, Z. Gao, Y. Wu, Y. Jia, and M. Harandi, “Learning a gradient-free riemannian optimizer on tangent spaces,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 35, no. 8, 2021, pp. 7377–7384.
  18. N. Qian, “On the Momentum Term in Gradient Descent Learning Algorithms,” Neural networks, vol. 12, no. 1, pp. 145–151, 1999.
  19. W.-Y. Chen, Y.-C. Liu, Z. Kira, Y.-C. F. Wang, and J.-B. Huang, “A closer look at few-shot classification,” in International Conference on Learning Representations, 2019.
  20. S. Ravi and H. Larochelle, “Optimization as a Model for Few-Shot Learning,” 2016.
  21. O. Vinyals, C. Blundell, T. Lillicrap, D. Wierstra et al., “Matching Networks for One Shot Learning,” Advances in neural information processing systems, vol. 29, pp. 3630–3638, 2016.
  22. C. Wah, S. Branson, P. Welinder, P. Perona, and S. Belongie, “The Caltech-ucsd Birds-200-2011 Dataset,” 2011.
  23. A. Krizhevsky, G. Hinton et al., “Learning multiple layers of features from tiny images,” 2009.
  24. N. Hilliard, L. Phillips, S. Howland, A. Yankov, C. D. Corley, and N. O. Hodas, “Few-Shot Learning with Metric-Agnostic Conditional Embeddings,” arXiv preprint arXiv:1802.04376, 2018.
  25. G. A. Miller, “Wordnet: A Lexical Database for English,” Communications of the ACM, vol. 38, no. 11, pp. 39–41, 1995.
  26. N. Dong and E. P. Xing, “Domain Adaption in One-Shot Learning,” in Joint European Conference on Machine Learning and Knowledge Discovery in Databases.   Springer, 2018, pp. 573–588.
  27. G. Cohen, S. Afshar, J. Tapson, and A. Van Schaik, “EMNIST: Extending MNIST to Handwritten Letters,” in 2017 International Joint Conference on Neural Networks (IJCNN).   IEEE, 2017, pp. 2921–2926.
  28. S. Ravi and H. Larochelle, “Optimization as a model for few-shot learning,” in In International Conference on Learning Representations (ICLR), 2017.
  29. A. Rajeswaran, C. Finn, S. M. Kakade, and S. Levine, “Meta-Learning with Implicit Gradients,” in Proceedings of the 33rd International Conference on Neural Information Processing Systems, 2019, pp. 113–124.
  30. G. Jerfel, E. Grant, T. Griffiths, and K. A. Heller, “Reconciling meta-learning and continual learning with online mixtures of tasks,” Advances in Neural Information Processing Systems, vol. 32, 2019.
  31. J. Gordon, J. Bronskill, M. Bauer, S. Nowozin, and R. E. Turner, “Meta-learning probabilistic inference for prediction,” in ICLR (Poster), 2019.
  32. M. Patacchiola, J. Turner, E. J. Crowley, M. O’Boyle, and A. J. Storkey, “Bayesian meta-learning for the few-shot setting via deep kernels,” Advances in Neural Information Processing Systems, vol. 33, pp. 16 108–16 118, 2020.
  33. N. Dvornik, C. Schmid, and J. Mairal, “Diversity with cooperation: Ensemble methods for few-shot classification,” in Proceedings of the IEEE/CVF international conference on computer vision, 2019, pp. 3723–3731.
  34. H.-Y. Tseng, H.-Y. Lee, J.-B. Huang, and M.-H. Yang, “Cross-domain few-shot classification via learned feature-wise transformation,” arXiv preprint arXiv:2001.08735, 2020.
  35. A. Afrasiyabi, J.-F. Lalonde, and C. Gagné, “Associative alignment for few-shot image classification,” in Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part V 16.   Springer, 2020, pp. 18–35.
  36. ——, “Mixture-based feature space learning for few-shot image classification,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021, pp. 9041–9051.
  37. K. Lee, S. Maji, A. Ravichandran, and S. Soatto, “Meta-learning with differentiable convex optimization,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2019, pp. 10 657–10 665.

Summary

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.