On the Optimization Landscape of Maximum Mean Discrepancy
Abstract: Generative models have been successfully used for generating realistic signals. Because the likelihood function is typically intractable in most of these models, the common practice is to use "implicit" models that avoid likelihood calculation. However, it is hard to obtain theoretical guarantees for such models. In particular, it is not understood when they can globally optimize their non-convex objectives. Here we provide such an analysis for the case of Maximum Mean Discrepancy (MMD) learning of generative models. We prove several optimality results, including for a Gaussian distribution with low rank covariance (where likelihood is inapplicable) and a mixture of Gaussians. Our analysis shows that that the MMD optimization landscape is benign in these cases, and therefore gradient based methods will globally minimize the MMD objective.
- NP-hardness of euclidean sum-of-squares clustering. Machine learning, 75(2):245–248.
- Maximum mean discrepancy gradient flow. 32.
- Demystifying MMD GANs. arXiv preprint arXiv:1801.01401.
- Hyperspectral unmixing overview: Geometrical, statistical, and sparse regression-based approaches. IEEE journal of selected topics in applied earth observations and remote sensing, 5(2):354–379.
- Wasserstein GAN can perform PCA. In 2019 57th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pages 895–901. IEEE.
- The limit points of (optimistic) gradient descent in min-max optimization. In 32nd Annual Conference on Neural Information Processing Systems (NIPS).
- Ten steps of em suffice for mixtures of two gaussians. In Conference on Learning Theory, pages 704–710. PMLR.
- Training generative neural networks via maximum mean discrepancy optimization. arXiv preprint arXiv:1505.03906.
- Gat-gmm: Generative adversarial training for gaussian mixture models. arXiv preprint arXiv:2006.10293.
- Understanding GANs: the lqg setting. arXiv preprint arXiv:1710.10793.
- Generative adversarial nets. Advances in Neural Information Processing Systems, 27:2672–2680.
- A kernel two-sample test. The Journal of Machine Learning Research, 13(1):723–773.
- Improved training of Wasserstein GANs. In Proceedings of the 31st International Conference on Neural Information Processing Systems, pages 5769–5779.
- Beyond convexity: Stochastic quasi-convex optimization. Advances in neural information processing systems, 28.
- Minimax estimation of neural net distance. arXiv preprint arXiv:1811.01054.
- How to escape saddle points efficiently. In International Conference on Machine Learning, pages 1724–1732. PMLR.
- Local maxima in the likelihood of gaussian mixture models: Structural results and algorithmic consequences. Advances in neural information processing systems, 29:4116–4124.
- Gradient descent only converges to minimizers. In Conference on learning theory, pages 1246–1257. PMLR.
- SGD learns one-layer networks in wGANs. In International Conference on Machine Learning, pages 5799–5808. PMLR.
- MMD GAN: Towards deeper understanding of moment matching network. arXiv preprint arXiv:1705.08584.
- On the limitations of first-order approximation in GAN dynamics. In Dy, J. and Krause, A., editors, Proceedings of the 35th International Conference on Machine Learning, volume 80 of Proceedings of Machine Learning Research, pages 3005–3013. PMLR.
- Generative moment matching networks. In International Conference on Machine Learning, pages 1718–1727. PMLR.
- The landscape of empirical risk for nonconvex losses. The Annals of Statistics, 46(6A):2747–2774.
- Settling the polynomial learnability of mixtures of gaussians. In 2010 IEEE 51st Annual Symposium on Foundations of Computer Science, pages 93–102. IEEE.
- On the convergence of gradient descent in GANs: MMD GAN as a gradient flow. In International Conference on Artificial Intelligence and Statistics, pages 1720–1728. PMLR.
- Müller, A. (1997). Integral probability metrics and their generating classes of functions. Advances in Applied Probability, 29(2):429–443.
- Vertex component analysis: A fast algorithm to unmix hyperspectral data. IEEE transactions on Geoscience and Remote Sensing, 43(4):898–910.
- Rustamov, R. M. (2019). Closed-form expressions for maximum mean discrepancy with applications to Wasserstein auto-encoders. arXiv preprint arXiv:1901.03227.
- When are nonconvex problems not scary? arXiv preprint arXiv:1510.06096.
- Generative models and model criticism via optimized maximum mean discrepancy. In ICLR.
- Minimax estimation of maximum mean discrepancy with radial kernels. Advances in Neural Information Processing Systems, 29:1930–1938.
- Improving MMD-GAN training with repulsive loss function. arXiv preprint arXiv:1812.09916.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.