Papers
Topics
Authors
Recent
Search
2000 character limit reached

The VampPrior Mixture Model

Published 6 Feb 2024 in cs.LG, cs.AI, and stat.ML | (2402.04412v3)

Abstract: Widely used deep latent variable models (DLVMs), in particular Variational Autoencoders (VAEs), employ overly simplistic priors on the latent space. To achieve strong clustering performance, existing methods that replace the standard normal prior with a Gaussian mixture model (GMM) require defining the number of clusters to be close to the number of expected ground truth classes a-priori and are susceptible to poor initializations. We leverage VampPrior concepts (Tomczak and Welling, 2018) to fit a Bayesian GMM prior, resulting in the VampPrior Mixture Model (VMM), a novel prior for DLVMs. In a VAE, the VMM attains highly competitive clustering performance on benchmark datasets. Integrating the VMM into scVI (Lopez et al., 2018), a popular scRNA-seq integration method, significantly improves its performance and automatically arranges cells into clusters with similar biological characteristics.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (31)
  1. Minimum-distortion embedding. Foundations and Trends® in Machine Learning, 14(3):211–378, 2021. ISSN 1935-8237. Publisher: Now Publishers, Inc.
  2. Fixing a broken ELBO. pp.  159–168. PMLR, 2018. ISBN 2640-3498.
  3. Importance Weighted Autoencoders, November 2016. URL http://arxiv.org/abs/1509.00519. arXiv:1509.00519 [cs, stat].
  4. The specious art of single-cell genomics. PLOS Computational Biology, 19(8):e1011288, August 2023. ISSN 1553-7358. doi: 10.1371/journal.pcbi.1011288. URL https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1011288. Publisher: Public Library of Science.
  5. Deep unsupervised clustering with gaussian mixture variational autoencoders. arXiv preprint arXiv:1611.02648, 2016.
  6. Edward, R. C. The infinite gaussian mixture model. Advances in neural information processing systems, pp. 554–560, 2000.
  7. Bayesian regularization for normal mixture estimation and model-based clustering. Journal of classification, 24(2):155–181, 2007. ISSN 0176-4268. Publisher: Springer.
  8. Elbo surgery: yet another way to carve up the variational evidence lower bound. volume 1, 2016. Issue: 2.
  9. Approximate Dirichlet Process Computing in Finite Normal Mixtures: Smoothing and Prior Information. Journal of Computational and Graphical Statistics, 11(3):508–532, September 2002. ISSN 1061-8600, 1537-2715. doi: 10.1198/106186002411. URL https://www.tandfonline.com/doi/full/10.1198/106186002411.
  10. Variational Deep Embedding: An Unsupervised and Generative Approach to Clustering. In Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence, IJCAI 2017, Melbourne, Australia, August 19-25, 2017, pp.  1965–1972, 2017. doi: 10.24963/IJCAI.2017/273. URL https://doi.org/10.24963/ijcai.2017/273.
  11. Composing graphical models with neural networks for structured representations and fast inference. Advances in neural information processing systems, 29, 2016.
  12. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
  13. Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114, 2013.
  14. Fast, sensitive and accurate integration of single-cell data with Harmony. Nature Methods, 16(12):1289–1296, December 2019. ISSN 1548-7091, 1548-7105. doi: 10.1038/s41592-019-0619-0. URL http://www.nature.com/articles/s41592-019-0619-0.
  15. Deep generative modeling for single-cell transcriptomics. Nature methods, 15(12):1053–1058, 2018. ISSN 1548-7091. Publisher: Nature Publishing Group US New York.
  16. Benchmarking atlas-level data integration in single-cell genomics. Nature methods, 19(1):41–50, 2022. ISSN 1548-7091. Publisher: Nature Publishing Group US New York.
  17. Eleven grand challenges in single-cell data science. Genome biology, 21(1):1–35, 2020. ISSN 1474-760X. Publisher: BioMed Central.
  18. Umap: Uniform manifold approximation and projection for dimension reduction. arXiv preprint arXiv:1802.03426, 2018.
  19. Approximate inference for deep latent gaussian mixtures. volume 2, pp.  131, 2016.
  20. Stick-Breaking Variational Autoencoders. In 5th International Conference on Learning Representations, ICLR 2017, Toulon, France, April 24-26, 2017, Conference Track Proceedings, 2017. URL https://openreview.net/forum?id=S1jmAotxg.
  21. Black box variational inference. pp.  814–822. PMLR, 2014.
  22. Stochastic backpropagation and approximate inference in deep generative models. pp.  1278–1286. PMLR, 2014.
  23. Absence of microglia promotes diverse pathologies and early lethality in Alzheimer’s disease mice. Cell reports, 39(11):110961, June 2022. ISSN 2211-1247. doi: 10.1016/j.celrep.2022.110961. URL https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9285116/.
  24. A new distribution on the simplex with auto-encoding applications. Advances in Neural Information Processing Systems, 32, 2019.
  25. Comprehensive Integration of Single-Cell Data. Cell, 177(7):1888–1902.e21, June 2019. ISSN 00928674. doi: 10.1016/j.cell.2019.05.031. URL https://linkinghub.elsevier.com/retrieve/pii/S0092867419305598.
  26. Interpretable factor models of single-cell RNA-seq via variational autoencoders. Bioinformatics, 36(11):3418–3421, 2020. ISSN 1367-4803. Publisher: Oxford University Press.
  27. VAE with a VampPrior. pp.  1214–1223. PMLR, 2018. ISBN 2640-3498.
  28. Visualizing data using t-SNE. Journal of machine learning research, 9(11), 2008. ISSN 1532-4435.
  29. Williams, R. J. Simple statistical gradient-following algorithms for connectionist reinforcement learning. Reinforcement learning, pp.  5–32, 1992. ISSN 1461366089. Publisher: Springer.
  30. Probabilistic harmonization and annotation of single‐cell transcriptomics data with deep generative models. Molecular systems biology, 17(1):e9620, 2021. ISSN 1744-4292.
  31. A Comprehensive Survey on Deep Clustering: Taxonomy, Challenges, and Future Directions, June 2022. URL http://arxiv.org/abs/2206.07579. arXiv:2206.07579 [cs].

Summary

Paper to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 3 tweets with 3 likes about this paper.