Infinite joint species distribution models
Abstract: Joint species distribution models are popular in ecology for modeling covariate effects on species occurrence, while characterizing cross-species dependence. Data consist of multivariate binary indicators of the occurrences of different species in each sample, along with sample-specific covariates. A key problem is that current models implicitly assume that the list of species under consideration is predefined and finite, while for highly diverse groups of organisms, it is impossible to anticipate which species will be observed in a study and discovery of unknown species is common. This article proposes a new modeling paradigm for statistical ecology, which generalizes traditional multivariate probit models to accommodate large numbers of rare species and new species discovery. We discuss theoretical properties of the proposed modeling paradigm and implement efficient algorithms for posterior computation. Simulation studies and applications to fungal biodiversity data provide compelling support for the new modeling class.
- Abramowitz, M. and I. A. Stegun (1948). Handbook of Mathematical Functions with formulas, graphs, and mathematical tables, Volume 55. US Government Printing Office.
- Fungal communities decline with urbanization - more in air than in soil. The ISME Journal 14(11), 2806–2815.
- Give me a sample of air and I will tell which species are found from your region: Molecular identification of fungi from airborne spore samples. Molecular Ecology Resources 18(3), 511–524.
- Beraha, M. and J. E. Griffin (2023). Normalised latent measure factor models. Journal of the Royal Statistical Society Series B: Statistical Methodology 85(4), 1247–1270.
- Bhattacharya, A. and D. B. Dunson (2011). Sparse Bayesian infinite factor models. Biometrika 98(2), 291–306.
- Feature allocations, probability functions, and paintboxes. Bayesian Analysis 8, 801–836.
- Scaled process priors for Bayesian nonparametric estimation of the unseen genetic variation. Journal of the American Statistical Association, to appear.
- Bayesian inference on high-dimensional multivariate binary responses. Journal of the American Statistical Association, to appear.
- Analysis of multivariate probit models. Biometrika 85(2), 347–361.
- Colwell, R. K. (2009). Biodiversity: concepts, patterns, and measurement. The Princeton Guide to Ecology 663, 257–263.
- Non-exchangeable feature allocation models with sublinear growth of the feature sizes. In International Conference on Artificial Intelligence and Statistics, Volume 108, pp. 3208–3218. PMLR.
- Variational inference for the Indian buffet process. In Proceedings of the Twelth International Conference on Artificial Intelligence and Statistics, Volume 5, pp. 137–144. PMLR.
- Drezner, Z. and G. O. Wesolowsky (1990). On the computation of the bivariate normal integral. Journal of Statistical Computation and Simulation 35(1-2), 101–107.
- Genz, A. (1992). Numerical computation of multivariate normal probabilities. Journal of Computational and Graphical Statistics 1(2), 141–149.
- Distance dependent infinite latent feature models. IEEE Transactions on Pattern Analysis and Machine Intelligence 37(2), 334–345.
- The Indian Buffet Process: An Introduction and Review. Journal of Machine Learning Research 12(32), 1185–1224.
- Hager, W. W. (1989). Updating the inverse of a matrix. SIAM Review 31(2), 221–239.
- Ishwaran, H. and L. F. James (2001). Gibbs sampling methods for stick-breaking priors. Journal of the American Statistical Association 96(453), 161–173.
- Sparse logistic principal components analysis for binary data. The Annals of Applied Statistics 4(3), 1579.
- Bayesian cumulative shrinkage for infinite factorizations. Biometrika 107(3), 745–752.
- The 2016 classification of lichenized fungi in the Ascomycota and Basidiomycota - Approaching one thousand genera. The Bryologist 119(4), 361–416.
- Interactive metagenomic visualization in a Web browser. BMC Bioinformatics 12(385).
- Using latent variable models to identify large networks of species-to-species associations at different spatial scales. Methods in Ecology and Evolution 7(5), 549–555.
- Owen, D. B. (1980). A table of normal integrals. Communications in Statistics-Simulation and Computation 9(4), 389–419.
- Similar mycorrhizal fungal communities associated with epiphytic and lithophytic orchids of coelogyne corymbosa. Plant Diversity 42(5), 362–369.
- Unbiased probabilistic taxonomic classification for dna barcoding. Bioinformatics 32(19), 2920–2927.
- Indian buffet processes with power-law behavior. In Advances in Neural Information Processing Systems, Volume 22. Curran Associates, Inc.
- Joint species distribution modelling with the r-package Hmsc. Methods in Ecology and Evolution 11(3), 442–447.
- The attraction Indian buffet distribution. Bayesian Analysis 17(3), 931–967.
- Dependent Indian buffet processes. In Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, Volume 9, pp. 924–931.
- SiGMoiD: A super-statistical generative model for binary data. PLoS Computational Biology 17(8), e1009275.
- Dependent hierarchical beta process for image interpolation and denoising. In Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics, pp. 883–891. JMLR Workshop and Conference Proceedings.
- Bayesian modeling of sequential discoveries. Journal of the American Statistical Association 118(544), 2521–2532.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.