Applications of the Vendi score in genomic epidemiology
Abstract: The Vendi score (VS), a diversity metric recently conceived in the context of machine learning, with applications in a wide range of fields, has a few distinct advantages over the metrics commonly used in ecology. It is classification-independent, incorporates abundance information, and has a tunable sensitivity to rare/abundant types. Using rich COVID-19 sequence data as a paradigm, we develop methods for applying the VS to time-resolved sequence data. We show how the VS allows for characterization of the overall diversity of circulating viruses and for discernment of emerging variants prior to formal identification. Furthermore, applying the VS to phylogenetic trees provides a convenient overview of within-clade diversity which can aid viral variant detection.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.