Papers
Topics
Authors
Recent
Search
2000 character limit reached

Emergent Equivariance in Deep Ensembles

Published 5 Mar 2024 in cs.LG | (2403.03103v2)

Abstract: We show that deep ensembles become equivariant for all inputs and at all training times by simply using data augmentation. Crucially, equivariance holds off-manifold and for any architecture in the infinite width limit. The equivariance is emergent in the sense that predictions of individual ensemble members are not equivariant but their collective prediction is. Neural tangent kernel theory is used to derive this result and we verify our theoretical insights using detailed numerical experiments.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (42)
  1. On Exact Computation with an Infinitely Wide Neural Net. In Advances in Neural Information Processing Systems, volume 32. Curran Associates, Inc., 2019.
  2. Equivariant few-shot learning from pretrained models. arXiv preprint arXiv:2305.09900, 2023a.
  3. Equi-tuning: Group equivariant fine-tuning of pretrained models. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 37, pp.  6788–6796, 2023b.
  4. Roto-Translation Covariant Convolutional Networks for Medical Image Analysis. In Frangi, A. F., Schnabel, J. A., Davatzikos, C., Alberola-López, C., and Fichtinger, G. (eds.), Medical Image Computing and Computer Assisted Intervention – MICCAI 2018, Lecture Notes in Computer Science, pp.  440–448, Cham, 2018. Springer International Publishing. ISBN 978-3-030-00928-1. doi: 10.1007/978-3-030-00928-1˙50.
  5. A program to build E(N)-equivariant steerable CNNs. In International Conference on Learning Representations, 2022. URL https://openreview.net/forum?id=WE4qe9xlnQw.
  6. A Kernel Theory of Modern Data Augmentation. In Proceedings of the 36th International Conference on Machine Learning, pp.  1528–1537. PMLR, May 2019.
  7. Provably Strict Generalisation Benefit for Equivariant Models. In Proceedings of the 38th International Conference on Machine Learning, pp.  2959–2969. PMLR, July 2021.
  8. Nonperturbative renormalization for the neural network-QFT correspondence. Machine Learning: Science and Technology, 3(1):015027, March 2022. ISSN 2632-2153. doi: 10.1088/2632-2153/ac4f69.
  9. Optimization Dynamics of Equivariant and Augmented Neural Networks. (arXiv:2303.13458), March 2023. doi: 10.48550/arXiv.2303.13458.
  10. A Neural Tangent Kernel Perspective of GANs. In Proceedings of the 39th International Conference on Machine Learning, pp.  6660–6704. PMLR, June 2022. doi: 10.48550/arXiv.2106.05566.
  11. Ensemble deep learning: A review. Engineering Applications of Artificial Intelligence, 115:105151, October 2022. ISSN 09521976. doi: 10.1016/j.engappai.2022.105151.
  12. Equivariance versus Augmentation for Spherical Images. In Proceedings of the 39th International Conference on Machine Learning, pp.  7404–7421. PMLR, June 2022. doi: 10.48550/arXiv.2202.03990.
  13. Geometric deep learning and equivariant neural networks. Artificial Intelligence Review, June 2023. ISSN 1573-7462. doi: 10.1007/s10462-023-10502-7.
  14. Neural Tangent Kernel: A Survey. (arXiv:2208.13614), August 2022. doi: 10.48550/arXiv.2208.13614.
  15. Neural Networks and Quantum Field Theory. Machine Learning: Science and Technology, 2(3):035002, September 2021. ISSN 2632-2153. doi: 10.1088/2632-2153/abeca3.
  16. Few-shot Backdoor Attacks via Neural Tangent Kernels. In The Eleventh International Conference on Learning Representations, September 2022. doi: 10.48550/arXiv.2210.05929.
  17. Dynamics of Deep Neural Networks and Neural Tangent Hierarchy. In Proceedings of the 37th International Conference on Machine Learning, pp.  4542–4551. PMLR, November 2020. doi: 10.48550/arXiv.1909.08156.
  18. Neural Tangent Kernel: Convergence and Generalization in Neural Networks. In Advances in Neural Information Processing Systems, volume 31. Curran Associates, Inc., 2018.
  19. Equivariance with learned canonicalization functions. In International Conference on Machine Learning, pp. 15546–15566. PMLR, 2023.
  20. 100,000 histological images of human colorectal cancer and healthy tissue. April 2018. doi: 10.5281/zenodo.1214456.
  21. Simple and scalable predictive uncertainty estimation using deep ensembles. Advances in neural information processing systems, 30, 2017.
  22. Deep Neural Networks as Gaussian Processes. In International Conference on Learning Representations, February 2018.
  23. Wide Neural Networks of Any Depth Evolve as Linear Models Under Gradient Descent. In Advances in Neural Information Processing Systems, volume 32. Curran Associates, Inc., 2019. doi: 10.1088/1742-5468/abc62b.
  24. Enhanced Convolutional Neural Tangent Kernels. arXiv:1911.00809 [cs, stat], November 2019.
  25. Learning with invariances in random features and kernel models. In Proceedings of Thirty Fourth Conference on Learning Theory, pp.  3351–3418. PMLR, July 2021.
  26. Equivariant adaptation of large pre-trained models. arXiv preprint arXiv:2310.01647, 2023.
  27. Learning with Group Invariant Features: A Kernel Perspective. In Advances in Neural Information Processing Systems, volume 28. Curran Associates, Inc., 2015.
  28. Neal, R. M. Bayesian Learning for Neural Networks. Springer Science & Business Media, 1996. ISBN 978-1-4612-0745-0.
  29. Neural Tangents: Fast and Easy Infinite Neural Networks in Python. In Eighth International Conference on Learning Representations, April 2020.
  30. Frame averaging for invariant and equivariant network design. arXiv preprint arXiv:2110.03336, 2021.
  31. Local Group Invariant Representations via Orbit Embeddings. In Proceedings of the 20th International Conference on Artificial Intelligence and Statistics, pp.  1225–1235. PMLR, April 2017.
  32. Fast, accurate antibody structure prediction from deep learning on massive set of natural antibodies. Nature communications, 14(1):2389, 2023.
  33. Hierarchical deep learning ensemble to automate the classification of breast cancer pathology reports by icd-o topography. arXiv preprint arXiv:2008.12571, 2020.
  34. Improved generalization bounds of group invariant / equivariant deep networks via quotient feature spaces. In Proceedings of the Thirty-Seventh Conference on Uncertainty in Artificial Intelligence, pp.  771–780. PMLR, December 2021. doi: 10.48550/arXiv.1910.06552.
  35. Tensor field networks: Rotation- and translation-equivariant neural networks for 3D point clouds. arXiv:1802.08219 [cs], May 2018.
  36. When and why PINNs fail to train: A neural tangent kernel perspective. Journal of Computational Physics, 449:110768, January 2022. ISSN 0021-9991. doi: 10.1016/j.jcp.2021.110768.
  37. General E(2)-Equivariant Steerable CNNs. In Conference on Neural Information Processing Systems (NeurIPS), 2019. URL https://arxiv.org/abs/1911.08251.
  38. Fashion-mnist: a novel image dataset for benchmarking machine learning algorithms, 2017.
  39. Yaida, S. Non-Gaussian processes and neural networks at finite widths. In Proceedings of The First Mathematical and Scientific Machine Learning Conference, pp.  165–192. PMLR, August 2020.
  40. Yang, G. Scaling Limits of Wide Neural Networks with Weight Sharing: Gaussian Process Behavior, Gradient Independence, and Neural Tangent Kernel Derivation. arXiv:1902.04760 [cond-mat, physics:math-ph, stat], April 2020.
  41. Feature Learning in Infinite-Width Neural Networks. (arXiv:2011.14522), July 2022. doi: 10.48550/arXiv.2011.14522.
  42. On the Neural Tangent Kernel Analysis of Randomly Pruned Neural Networks. In Proceedings of The 26th International Conference on Artificial Intelligence and Statistics, pp.  1513–1553. PMLR, April 2023.
Citations (4)

Summary

  • The paper proves that deep ensembles exhibit emergent equivariance through full data augmentation from initialization to convergence.
  • It leverages NTK theory to rigorously show how input transformations induce predictable changes in the kernel space.
  • Empirical evaluations on benchmarks like FashionMNIST confirm that ensemble outputs maintain symmetry despite significant individual model variance.

Emergent Equivariance in Deep Ensembles

In the paper titled "Emergent Equivariance in Deep Ensembles," the authors investigate an intriguing property of deep learning models, specifically how deep ensembles can be made equivariant through the use of data augmentation. The paper provides a comprehensive analysis backed by theoretical proofs and empirical evaluations.

Conceptual Framework

The study is grounded in the analysis of deep ensembles—where predictions of multiple models are averaged to estimate uncertainty—and their ability to become equivariant with respect to symmetries in data. Equivariance here refers to the property that applying a transformation to the input of a function results in a corresponding transformation of the output. The central claim is that, by using full data augmentation, deep ensembles achieve this property at all phases of training and for any model architecture in the infinite width limit.

Theoretical Contributions

The authors employ neural tangent kernel (NTK) theory, a deterministic framework for analyzing the training dynamics of infinitely wide neural networks, to derive their results. Key contributions of the paper include:

  1. Proof of Emergent Equivariance: It is shown that deep ensembles inherently exhibit equivariance when data augmentation is applied. This applies not only at convergence but also during initialization and off-manifold, breaking the typical assumption that equivariance mainly occurs on the data manifold and late in training.
  2. Mathematical Derivation: Using the NTK and NNGP kernel formulations, the paper provides a rigorous demonstration of how transformations in the input space induce predictable behavior in the kernel space, leading to ensemble predictions that respect the symmetries of the input images.
  3. Handling Finite-Sample and Continuous Group Complications: The authors address how the previously stated conclusions are affected by practical constraints such as finite ensemble size and continuous group approximations, establishing bounds on the deviations from perfect equivariance.

Experimental Evidence

The theoretical advancements are substantiated with experiments on benchmarks like the 2D Ising model, FashionMNIST, and histological slice data. These experiments:

  • Showcase Equivariance of Deep Ensembles: Across the board, ensembles manifested a remarkable degree of equivariance. The evaluations on transformations of input datasets affirm that predictions remain consistent across the group orbit of transformations thanks to data augmentation.
  • Examine Model and Ensemble Variance: Individual model predictions within an ensemble diverge significantly, yet the aggregated ensemble output realigns with the ground truth, highlighting the emergent nature of equivariance.

Implications and Future Directions

Practically, this work simplifies the route to integrating symmetries into deep learning models without the need to explicitly design emotion-specific architectures. The results imply potential for significant improvements in tasks requiring robust feature recognition invariant to transformations, such as medical imaging or physical sciences. Theoretically, this paper paves the way for further research into the nuanced relationships between data symmetry, architecture design, and learning efficiency, particularly through the lens of NTK analysis.

Concluding Thoughts

The paper presents rigorous analysis and insights into an emergent property of deep ensembles that can have far-reaching implications for how symmetric data is handled in neural network training. By demonstrating that deep ensembles can organically realize equivariant behavior, the authors not only simplify modeling pipelines but also bridge a gap between theoretical understanding and practical application, fostering further investigations in both domains.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We found no open problems mentioned in this paper.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Authors (2)

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 3 tweets with 43 likes about this paper.