Papers
Topics
Authors
Recent
Search
2000 character limit reached

Scalable Amortized GPLVMs for Single Cell Transcriptomics Data

Published 6 May 2024 in stat.ML, cs.LG, q-bio.GN, and stat.AP | (2405.03879v1)

Abstract: Dimensionality reduction is crucial for analyzing large-scale single-cell RNA-seq data. Gaussian Process Latent Variable Models (GPLVMs) offer an interpretable dimensionality reduction method, but current scalable models lack effectiveness in clustering cell types. We introduce an improved model, the amortized stochastic variational Bayesian GPLVM (BGPLVM), tailored for single-cell RNA-seq with specialized encoder, kernel, and likelihood designs. This model matches the performance of the leading single-cell variational inference (scVI) approach on synthetic and real-world COVID datasets and effectively incorporates cell-cycle and batch information to reveal more interpretable latent structures as we demonstrate on an innate immunity dataset.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (28)
  1. Grandprix: scaling up the bayesian gplvm for single-cell data. Bioinformatics, 35(1):47–54, 2019.
  2. Computational analysis of cell-to-cell heterogeneity in single-cell rna-sequencing data reveals hidden subpopulations of cells. Nature biotechnology, 33(2):155–160, 2015.
  3. Bayesian gaussian process latent variable models for pseudotime inference in single-cell rna-seq data. bioRxiv, pp.  026872, 2015.
  4. The single-cell transcriptional landscape of mammalian organogenesis. Nature, 566(7745):496–502, 2019.
  5. Low dimensionality in gene expression data enables the accurate extraction of transcriptional programs from shallow sequencing. Cell systems, 2(4):239–250, 2016.
  6. Gaussian processes for big data. arXiv preprint arXiv:1309.6835, 2013.
  7. Computational methods for single-cell rna sequencing. Annual Review of Biomedical Data Science, 3:339–364, 2020.
  8. Stochastic variational inference. Journal of Machine Learning Research, 2013.
  9. Identifying gene expression programs of cell-type identity and cellular activity with single-cell rna-seq. Elife, 8:e43803, 2019.
  10. Mapping interindividual dynamics of innate immune response at single-cell resolution. bioRxiv, pp.  2021–09, 2021.
  11. Modelling technical and biological effects in scrna-seq data with scalable gplvms. arXiv preprint arXiv:2209.06716, 2022a.
  12. Generalised gplvm with stochastic variational inference. In International Conference on Artificial Intelligence and Statistics, pp.  7841–7864. PMLR, 2022b.
  13. Neil D Lawrence. Gaussian process models for visualisation of high dimensional data. Advances in Neural Information Processing Systems, 2004.
  14. Deep generative modeling for single-cell transcriptomics. Nature methods, 15(12):1053–1058, 2018.
  15. Current best practices in single-cell rna-seq analysis: a tutorial. Molecular systems biology, 15(6):e8746, 2019.
  16. Benchmarking atlas-level data integration in single-cell genomics. Nature methods, 19(1):41–50, 2022.
  17. Pooling across cells to normalize single-cell rna sequencing data with many zero counts. Genome Biology, 17(75), 2016. doi: https://doi.org/10.1186/s13059-016-0947-7.
  18. Umap: Uniform manifold approximation and projection for dimension reduction. arXiv preprint arXiv:1802.03426, 2018.
  19. A revised airway epithelial hierarchy includes cftr-expressing ionocytes. Nature, 560(7718):319–324, 2018.
  20. A single-cell atlas of the airway epithelium reveals the cftr-rich pulmonary ionocyte. Nature, 560(7718):377–381, 2018.
  21. Single-cell multi-omics analysis of the immune response in covid-19. Nature medicine, 27(5):904–916, 2021.
  22. Exponential scaling of single-cell rna-seq in the past decade. Nature protocols, 13(4):599–604, 2018.
  23. Interpretable factor models of single-cell rna-seq via variational autoencoders. Bioinformatics, 36(11):3418–3421, 2020.
  24. Scaling single-cell genomics from phenomenology to mechanism. Nature, 541(7637):331–338, 2017.
  25. From louvain to leiden: guaranteeing well-connected communities. Scientific reports, 9(1):5233, 2019.
  26. A robust nonlinear low-dimensional manifold for single cell rna-seq data. BMC bioinformatics, 21(1):1–15, 2020.
  27. Scanpy: large-scale single-cell gene expression data analysis. Genome biology, 19:1–5, 2018.
  28. Splatter: simulation of single-cell rna sequencing data. Genome biology, 18(1):174, 2017.

Summary

No one has generated a summary of this paper yet.

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 2 tweets with 3 likes about this paper.