Robust methods for determining sample size in national genomic projects

Develop robust methods to determine the number of participants required for a national human genomic diversity project to adequately characterize within-country genomic variation, ensuring that the recommended sample size aligns with project goals and population characteristics (including the detection of rare variants).

Background

National genomic initiatives vary widely in scope and cost, and their ability to detect both common and rare variants depends critically on the number of participants sequenced. Countries differ in demographic history and population structure, which complicates one-size-fits-all guidance on cohort size.

Existing projects span from hundreds to hundreds of thousands of genomes, reflecting the absence of standardized methodology for sample size planning. While isolated examples (e.g., the Korean Genome Project) provide insight into diminishing returns for common variants and suggest modest sizes for rare variant detection, a general, transferable framework for estimating sufficient sample sizes across diverse national contexts has not been established.

References

Unfortunately, we still do not have robust methods to suggest a number of samples sufficient for a national genomic project, with existing national genomic projects ranging from hundreds to hundreds of thousands samples.

Challenges and Recommendations in Establishing National Human Diversity Genomic Projects  (2510.19869 - Oleksyk et al., 22 Oct 2025) in Recommendations for planning and implementation of national genomic projects — Economic Considerations