Papers
Topics
Authors
Recent
Search
2000 character limit reached

On the Optimization Landscape of Maximum Mean Discrepancy

Published 26 Oct 2021 in cs.LG and stat.ML | (2110.13452v2)

Abstract: Generative models have been successfully used for generating realistic signals. Because the likelihood function is typically intractable in most of these models, the common practice is to use "implicit" models that avoid likelihood calculation. However, it is hard to obtain theoretical guarantees for such models. In particular, it is not understood when they can globally optimize their non-convex objectives. Here we provide such an analysis for the case of Maximum Mean Discrepancy (MMD) learning of generative models. We prove several optimality results, including for a Gaussian distribution with low rank covariance (where likelihood is inapplicable) and a mixture of Gaussians. Our analysis shows that that the MMD optimization landscape is benign in these cases, and therefore gradient based methods will globally minimize the MMD objective.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (32)
  1. NP-hardness of euclidean sum-of-squares clustering. Machine learning, 75(2):245–248.
  2. Maximum mean discrepancy gradient flow. 32.
  3. Demystifying MMD GANs. arXiv preprint arXiv:1801.01401.
  4. Hyperspectral unmixing overview: Geometrical, statistical, and sparse regression-based approaches. IEEE journal of selected topics in applied earth observations and remote sensing, 5(2):354–379.
  5. Wasserstein GAN can perform PCA. In 2019 57th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pages 895–901. IEEE.
  6. The limit points of (optimistic) gradient descent in min-max optimization. In 32nd Annual Conference on Neural Information Processing Systems (NIPS).
  7. Ten steps of em suffice for mixtures of two gaussians. In Conference on Learning Theory, pages 704–710. PMLR.
  8. Training generative neural networks via maximum mean discrepancy optimization. arXiv preprint arXiv:1505.03906.
  9. Gat-gmm: Generative adversarial training for gaussian mixture models. arXiv preprint arXiv:2006.10293.
  10. Understanding GANs: the lqg setting. arXiv preprint arXiv:1710.10793.
  11. Generative adversarial nets. Advances in Neural Information Processing Systems, 27:2672–2680.
  12. A kernel two-sample test. The Journal of Machine Learning Research, 13(1):723–773.
  13. Improved training of Wasserstein GANs. In Proceedings of the 31st International Conference on Neural Information Processing Systems, pages 5769–5779.
  14. Beyond convexity: Stochastic quasi-convex optimization. Advances in neural information processing systems, 28.
  15. Minimax estimation of neural net distance. arXiv preprint arXiv:1811.01054.
  16. How to escape saddle points efficiently. In International Conference on Machine Learning, pages 1724–1732. PMLR.
  17. Local maxima in the likelihood of gaussian mixture models: Structural results and algorithmic consequences. Advances in neural information processing systems, 29:4116–4124.
  18. Gradient descent only converges to minimizers. In Conference on learning theory, pages 1246–1257. PMLR.
  19. SGD learns one-layer networks in wGANs. In International Conference on Machine Learning, pages 5799–5808. PMLR.
  20. MMD GAN: Towards deeper understanding of moment matching network. arXiv preprint arXiv:1705.08584.
  21. On the limitations of first-order approximation in GAN dynamics. In Dy, J. and Krause, A., editors, Proceedings of the 35th International Conference on Machine Learning, volume 80 of Proceedings of Machine Learning Research, pages 3005–3013. PMLR.
  22. Generative moment matching networks. In International Conference on Machine Learning, pages 1718–1727. PMLR.
  23. The landscape of empirical risk for nonconvex losses. The Annals of Statistics, 46(6A):2747–2774.
  24. Settling the polynomial learnability of mixtures of gaussians. In 2010 IEEE 51st Annual Symposium on Foundations of Computer Science, pages 93–102. IEEE.
  25. On the convergence of gradient descent in GANs: MMD GAN as a gradient flow. In International Conference on Artificial Intelligence and Statistics, pages 1720–1728. PMLR.
  26. Müller, A. (1997). Integral probability metrics and their generating classes of functions. Advances in Applied Probability, 29(2):429–443.
  27. Vertex component analysis: A fast algorithm to unmix hyperspectral data. IEEE transactions on Geoscience and Remote Sensing, 43(4):898–910.
  28. Rustamov, R. M. (2019). Closed-form expressions for maximum mean discrepancy with applications to Wasserstein auto-encoders. arXiv preprint arXiv:1901.03227.
  29. When are nonconvex problems not scary? arXiv preprint arXiv:1510.06096.
  30. Generative models and model criticism via optimized maximum mean discrepancy. In ICLR.
  31. Minimax estimation of maximum mean discrepancy with radial kernels. Advances in Neural Information Processing Systems, 29:1930–1938.
  32. Improving MMD-GAN training with repulsive loss function. arXiv preprint arXiv:1812.09916.
Citations (1)

Summary

No one has generated a summary of this paper yet.

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Authors (3)

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 1 tweet with 1 like about this paper.