Papers
Topics
Authors
Recent
Search
2000 character limit reached

Density Estimation via Measure Transport: Outlook for Applications in the Biological Sciences

Published 27 Sep 2023 in q-bio.QM, cs.LG, and physics.bio-ph | (2309.15366v4)

Abstract: One among several advantages of measure transport methods is that they allow for a unified framework for processing and analysis of data distributed according to a wide class of probability measures. Within this context, we present results from computational studies aimed at assessing the potential of measure transport techniques, specifically, the use of triangular transport maps, as part of a workflow intended to support research in the biological sciences. Scenarios characterized by the availability of limited amount of sample data, which are common in domains such as radiation biology, are of particular interest. We find that when estimating a distribution density function given limited amount of sample data, adaptive transport maps are advantageous. In particular, statistics gathered from computing series of adaptive transport maps, trained on a series of randomly chosen subsets of the set of available data samples, leads to uncovering information hidden in the data. As a result, in the radiation biology application considered here, this approach provides a tool for generating hypotheses about gene relationships and their dynamics under radiation exposure.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (46)
  1. Clustering and classification through normalizing flows in feature space. Multiscale Model. Simul., 8(5):1784–1802, 2010. doi: 10.1137/100783522.
  2. R. Baptista and P.-B. Rubio. AdaptiveTransportMaps software library, 2022. https://github.com/baptistar/ATM.
  3. On the representation and learning of monotone triangular transport maps. Foundations of Computational Mathematics, 2023. doi: 10.1007/s10208-023-09630-x.
  4. Triangular transformations of measures. Sbornik: Mathematics, 196(3), 2005. doi: 10.1070/SM2005v196n03ABEH000882.
  5. Incorporating biological prior knowledge for Bayesian learning via maximal knowledge-driven information priors. BMC Bioinformatics, 18, 2017. doi: 10.1186/s12859-017-1893-4.
  6. A unified computational framework for single-cell data integration with optimal transport. Nat. Commun., 13(1), 2022. doi: 10.1038/s41467-022-35094-8.
  7. The shape of gene expression distributions matter: how incorporating distribution shape improves the interpretation of cancer transcriptomic data. BMC Bioinformatics, 21, 2020. doi: 10.1186/s12859-020-03892-w.
  8. Density estimation using Real NVP. 2017. doi: 10.48550/arXiv.1605.08803.
  9. Gene Expression Omnibus: NCBI gene expression and hybridization array data repository. Nucleic Acids Res., 30(1):207–210, 2002. doi: 10.1093/nar/30.1.207.
  10. T. A. El Moselhy and Y. M. Marzouk. Bayesian inference with optimal maps. J. Comput. Phys., 231(23):7815–7850, 2012. doi: 10.1016/j.jcp.2012.07.022.
  11. Adaptive Monte Carlo augmented with normalizing flows. Proc. Natl. Acad. Sci., 119(10), 2022. doi: 10.1073/pnas.2109420119.
  12. Assignment of unimodal probability distribution models for quantitative morphological phenotyping. BMC Biol, 20, 2022. doi: 10.1186/s12915-022-01283-6.
  13. Matching Single Cells Across Modalities with Contrastive Learning and Optimal Transport. Brief. Bioinform., 24(3), 2023. doi: 10.1093/bib/bbad130.
  14. The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer Series in Statistics. Springer, Switzerland, 2009.
  15. P. R. Johnstone. Github repository, 2023. https://github.com/1austrartsua1/KEGG_ML.
  16. M. Kanehisa and S. Goto. KEGG: Kyoto Encyclopedia of Genes and Genomes. Nucleic Acids Res., 28(1):27–30, 2000. doi: 10.1093/nar/28.1.27.
  17. M. Katzfuss and F. Schäfer. Scalable Bayesian transport maps for high-dimensional non-Gaussian spatial fields, 2023.
  18. The UCI Machine Learning Repository, 1987–present. https://archive.ics.uci.edu.
  19. A. V. Kolesnikov. Convexity inequalities and optimal transport of infinite-dimensional measures. J. Math. Pures Appl., 83:1373–1404, 2004. doi: 10.1016/j.matpur.2004.03.005.
  20. V. Lohweg. Banknote Authentication. UCI Machine Learning Repository, 2013.
  21. X. Luo. Github repository, 2022. https://github.com/Xihaier/Pathway-Analysis-of-Low-Dose-Radiation-Data.
  22. Comprehensive analysis of gene expression profiles to radiation exposure reveals molecular signatures of low-dose radiation response. In 2022 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), pages 2366–2374, 2022. doi: 10.1109/BIBM55620.2022.9995607.
  23. Pathway-based analyses of gene expression profiles at low doses of ionizing radiation, 2023. Submitted.
  24. Sampling via measure transport: An introduction. In R. Ghanem, D. Higdon, and H. Owhadi, editors, Handbook of Uncertainty Quantification, pages 1–41. Springer International Publishing, Cham, 2016. doi: 10.1007/978-3-319-11259-6_23-1.
  25. MathWorks. Statistics and machine learning toolbox, 2022. https://www.mathworks.com/products/statistics.html.
  26. Giovanni Migliorati. Adaptive approximation by optimal weighted least-squares methods. SIAM Journal on Numerical Analysis, 57(5):2217–2245, 2019. doi: 10.1137/18M1198387.
  27. MIT Uncertainty Quantification Group. Software libraries, 2013–present. https://uqgroup.mit.edu/software.
  28. MIT Uncertainty Quantification Group. TransportMaps software library, 2015–present. https://transportmaps.mit.edu/docs/.
  29. Beyond normality: Learning sparse probabilistic graphical models in the non-Gaussian setting. In Proceedings of the 31st International Conference on Neural Information Processing Systems, NIPS’17, pages 2356–2366, Red Hook, NY, USA, 2017. Curran Associates Inc. ISBN 9781510860964. doi: 10.48550/arXiv.1711.00950.
  30. Characterization of gene expression profiles at low and very low doses of ionizing radiation. DNA Repair, 12(7):508–517, 2013. doi: 10.1016/j.dnarep.2013.04.021.
  31. F. Otto and C. Villani. Generalization of an inequality by Talagrand, and links with the logarithmic Sobolev inequality. J. Funct. Anal., 173(2):361–400, 2000. doi: 10.1006/jfan.1999.3557.
  32. Normalizing flows for probabilistic modeling and inference. J. Mach. Learn. Res., 22(1), 2021. doi: 10.48550/arXiv.1912.02762.
  33. Automated extraction of molecular interactions and pathway knowledge using large language model, Galactica: Opportunities and challenges. In The 22nd Workshop on Biomedical Natural Language Processing and BioNLP Shared Tasks, pages 255–264, 2023a. doi: 10.18653/v1/2023.bionlp-1.22.
  34. Comparative performance evaluation of large language models for extracting molecular interactions and pathway knowledge. 2023b. doi: 10.48550/arXiv.2307.08813.
  35. Scikit-learn: Machine learning in Python. J. Mach. Learn. Res., 12:2825–2830, 2011. https://scikit-learn.org/stable/modules/generated/sklearn.metrics.f1_score.html.
  36. G. Peyré and M. Cuturi. Computational optimal transport: With applications to data science. Foundations and Trends® in Machine Learning, 11(5–6):355–607, 2019. doi: 10.1561/2200000073.
  37. F. Santambrogio. Optimal Transport for Applied Mathematicians: Calculus of Variations, PDEs, and Modeling. Progress in Nonlinear Differential Equations and Their Applications. Birkhäuser, Cham, 2015. doi: 10.1007/978-3-319-20828-2.
  38. B.W. Silverman. Density Estimation for Statistics and Data Analysis. Monographs on Statistics and Applied Probability. Chapman & Hall, London, 1998.
  39. Inference via low-dimensional couplings. J. Mach. Learn. Res., 19(1), 2018. doi: 10.48550/arXiv.1703.06131.
  40. Coupling techniques for nonlinear ensemble filtering. SIAM Rev., 64(4):921–953, 2022. doi: 10.1137/20M1312204.
  41. A family of nonparametric density estimation algorithms. Commun. Pure Appl. Math., 66(2):145–164, 2013. doi: 10.1002/cpa.21423.
  42. E. G. Tabak and E. Vanden-Eijnden. Density estimation by dual ascent of the log-likelihood. Commun. Math. Sci., 8(1):217–233, 2010. doi: 10.4310/CMS.2010.v8.n1.a11.
  43. Iterative R and R (rotation and remarginalization) for detecting targets in spectral imagery. In Emmett J. Ientilucci and Christine L. Bradley, editors, Imaging Spectrometry XXV: Applications, Sensors, and Processing, volume 12235. International Society for Optics and Photonics, SPIE, 2022. doi: 10.1117/12.2633590.
  44. C. Uhler and G. V. Shivashankar. Machine learning approaches to single-cell data integration and translation. Proceedings of the IEEE, 110(5):557–576, 2022. doi: 10.1109/JPROC.2022.3166132.
  45. C. Villani. Optimal Transport: Old and New. Springer Berlin, Heidelberg, 2009. doi: 10.1007/978-3-540-71050-9.
  46. H. Yang and E. G. Tabak. Conditional density estimation, latent variable discovery, and optimal transport. Commun. Pure Appl. Math., 75(3):610–663, 2022. doi: 10.1002/cpa.21972.

Summary

No one has generated a summary of this paper yet.

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 2 tweets with 0 likes about this paper.