Papers
Topics
Authors
Recent
Search
2000 character limit reached

Toward Capturing Genetic Epistasis From Multivariate Genome-Wide Association Studies Using Mixed-Precision Kernel Ridge Regression

Published 3 Sep 2024 in q-bio.GN, cs.AR, cs.LG, cs.MS, and cs.PF | (2409.01712v1)

Abstract: We exploit the widening margin in tensor-core performance between [FP64/FP32/FP16/INT8,FP64/FP32/FP16/FP8/INT8] on NVIDIA [Ampere,Hopper] GPUs to boost the performance of output accuracy-preserving mixed-precision computation of Genome-Wide Association Studies (GWAS) of 305K patients from the UK BioBank, the largest-ever GWAS cohort studied for genetic epistasis using a multivariate approach. Tile-centric adaptive-precision linear algebraic techniques motivated by reducing data motion gain enhanced significance with low-precision GPU arithmetic. At the core of Kernel Ridge Regression (KRR) techniques for GWAS lie compute-bound cubic-complexity matrix operations that inhibit scaling to aspirational dimensions of the population, genotypes, and phenotypes. We accelerate KRR matrix generation by redesigning the computation for Euclidean distances to engage INT8 tensor cores while exploiting symmetry.We accelerate solution of the regularized KRR systems by deploying a new four-precision Cholesky-based solver, which, at 1.805 mixed-precision ExaOp/s on a nearly full Alps system, outperforms the state-of-the-art CPU-only REGENIE GWAS software by five orders of magnitude.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (40)
  1. E. Uffelmann et al. Genome-wide association studies. Nature Reviews Methods Primers, 1(1), August 2021.
  2. C. Bycroft et al. The UK Biobank resource with deep phenotyping and genomic data. Nature, 562(7726):203–209, October 2018.
  3. FinnGen provides genetic insights from a well-phenotyped isolated population. Nature, 613(7944):508–518, January 2023.
  4. P. H. Huynh et al. Improvements in the large p, small n classification issue. SN Computer Science, 1(207), 2020.
  5. G. Jia et al. The continuous space of human diseases mapped to genetic loci predicts disease trajectories and risk. Nature Computational Science, 2022.
  6. P. C. Phillips. Epistasis — the essential role of gene interactions in the structure and evolution of genetic systems. Nature Reviews Genetics, 9(11):855–867, November 2008.
  7. T. Hastie et al. The Elements of Statistical Learning. Springer New York, 2009.
  8. R. Tibshirani. Regression Shrinkage and Selection via the LASSO. Journal of the Royal Statistical Society, Series B, 58:267–288, 1996.
  9. T. Hofmann et al. Kernel methods in machine learning. The Annals of Statistics, 36(3), June 2008.
  10. S. J. Virolainen et al. Gene–environment interactions and their impact on human health. Genes & Immunity, 24(1):1–11, December 2022.
  11. P.-R. Loh et al. Efficient Bayesian mixed-model analysis increases association power in large cohorts. Nature Genetics, 47(3):284–290, February 2015.
  12. L. Jiang et al. A resource-efficient tool for mixed model association analysis of large-scale data. Nature Genetics, 51(12):1749–1755, November 2019.
  13. J. Mbatchou et al. Computationally efficient whole-genome regression for quantitative and binary traits. Nature Genetics, 53(7):1097–1103, May 2021.
  14. J. Whittaker et al. Marker-assisted selection using ridge regression. Genetical Research, 75:249–252, 2000.
  15. W. Joubert et al. Attacking the opioid epidemic: Determining the epistatic and pleiotropic genetic architectures for chronic pain and opioid addiction. In SC18: International Conference for High Performance Computing, Networking, Storage and Analysis, pp. 57:1–57:14, 2018.
  16. M. Wu et al. Rare-variant association testing for sequencing data with the sequence kernel association test. The American Journal of Human Genetics, 89(1):82–93, July 2011.
  17. G. Morota and D. Gianola. Kernel-based whole-genome prediction of complex traits: a review. Frontiers in Genetics, 5, October 2014.
  18. R. Alomairy. High-Performance Scientific Applications Using Mixed Precision and Low-Rank Approximation Powered by Task-based Runtime Systems. PhD thesis, King Abdullah University of Science and Technology, 2022.
  19. N. J. Higham and T. Mary. Mixed precision algorithms in numerical linear algebra. Acta Numerica, 31:347–414, 2022.
  20. Q. Cao et al. Reshaping Geostatistical Modeling and Prediction for Extreme-Scale Environmental Applications. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, pp. 13–24. Association for Computing Machinery, 2022. Gordon Bell Prize Finalist.
  21. C. Augonnet et al. StarPU: a unified platform for task scheduling on heterogeneous multicore architectures. Concurrency and Computation: Practice and Experience, 23(2):187–198, 2011.
  22. G. Bosilca et al. PaRSEC: Exploiting Heterogeneity to Enhance Scalability. Computing in Science & Engineering, 15(6):36–45, 2013.
  23. M.-A. Arfaoui et al. Efficient Sphere Detector Algorithm for Massive MIMO Using GPU Hardware Accelerator. Procedia Computer Science, 80:2169–2180, 2016. International Conference on Computational Science 2016, ICCS 2016, 6-8 June 2016, San Diego, California, USA.
  24. B. Gallet and M. Gowanlock. Leveraging GPU Tensor Cores for Double Precision Euclidean Distance Calculations. In 2022 IEEE 29th International Conference on High Performance Computing, Data, and Analytics (HiPC), pp. 135–144, 2022.
  25. H. Ltaief et al. Steering Customized AI Architectures for HPC Scientific Applications. In International Supercomputing Conference, volume 13948, pp. 125–143. Springer Lecture Notes in Computer Science (LNCS), 2023.
  26. L. Fox et al. Notes on the Solution of Algebraic Linear Simultaneous Equations. The Quarterly Journal of Mechanics and Applied Mathematics, 1(1):149–173, 01 1948.
  27. C. B. Moler. Iterative refinement in floating point. J. ACM, 14(2):316–321, apr 1967.
  28. J. J. Dongarra et al. Improving the accuracy of computed eigenvalues and eigenvectors. SIAM Journal on Numerical Analysis, 20(1):23–45, 1983.
  29. A. Buttari et al. Mixed Precision Iterative Refinement Techniques for the Solution of Dense Linear Systems. The International Journal of High Performance Computing Applications, 21(4):457–466, 2007.
  30. E. Carson and N. J. Higham. Accelerating the Solution of Linear Systems by Iterative Refinement in Three Precisions. SIAM Journal on Scientific Computing, 40(2):A817–A847, 2018.
  31. A. Haidar et al. Harnessing GPU Tensor Cores for Fast FP16 Arithmetic to Speed Up Mixed-precision Iterative Refinement Solvers. In Proceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis, SC ’18, pp. 47:1–47:11, NJ, USA, 2018. IEEE Press.
  32. N. Doucet et al. Mixed-Precision Tomographic Reconstructor Computations on Hardware Accelerators. In 2019 IEEE/ACM 9th Workshop on Irregular Applications: Architectures and Algorithms (IA3), pp. 31–38, 2019.
  33. S. Abdulah et al. ExaGeoStat: A high performance unified software for geostatistics on manycore systems. IEEE Transactions on Parallel and Distributed Systems, 29(12):2771–2784, 2018.
  34. H. Ltaief et al. Responsibly Reckless Matrix Algorithms for HPC Scientific Applications. Computing in Science & Engineering, 24(4):12–22, 2022.
  35. Q. Cao et al. Reducing data motion and energy consumption of geospatial modeling applications using automated precision conversion. In 2023 IEEE International Conference on Cluster Computing (CLUSTER), pp. 330–342, 2023.
  36. F. Baumdicker et al. Efficient ancestry and mutation simulation with msprime 1.0. Genetics, 220(3):iyab229, 2022.
  37. S. Abdulah et al. Accelerating Geostatistical Modeling and Prediction With Mixed-Precision Computations: A High-Productivity Approach with PaRSEC. IEEE Transactions on Parallel and Distributed Systems, 33(4):964–976, 2021.
  38. R. Klein et al. Complement factor H polymorphism in age-related macular degeneration. Science, 308(5720):385–389, 2005.
  39. B. Li and M. Ritchie. From GWAS to Gene: Transcriptome-Wide Association Studies and Other Methods to Functionally Understand GWAS Discoveries. Frontiers in Genetics, 12, 2021.
  40. G. Chavez et al. Scalable and memory-efficient kernel ridge regression. In 2020 IEEE International Parallel and Distributed Processing Symposium (IPDPS), 2020.
Citations (3)

Summary

No one has generated a summary of this paper yet.

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.