Universally Consistent K-Sample Tests via Dependence Measures
Abstract: The K-sample testing problem involves determining whether K groups of data points are each drawn from the same distribution. Analysis of variance is arguably the most classical method to test mean differences, along with several recent methods to test distributional differences. In this paper, we demonstrate the existence of a transformation that allows K-sample testing to be carried out using any dependence measure. Consequently, universally consistent K-sample testing can be achieved using a universally consistent dependence measure, such as distance correlation and the Hilbert-Schmidt independence criterion. This enables a wide range of dependence measures to be easily applied to K-sample testing.
- Student. The probable error of a mean. Biometrika, pages 1–25, 1908.
- Harold Hotelling. The generalization of student’s ratio. In Breakthroughs in statistics, pages 54–65. Springer, 1992.
- Ronald A Fisher. Xv.—the correlation between relatives on the supposition of mendelian inheritance. Earth and Environmental Science Transactions of the Royal Society of Edinburgh, 52(2):399–433, 1919.
- Maurice S Bartlett. Multivariate analysis. Supplement to the journal of the royal statistical society, 9(2):176–197, 1947.
- JP Stevens. Applied multivariate statistics for the social sciences. lawrence erlbaum. Mahwah, NJ, pages 510–1, 2002.
- Russell Warne. A primer on multivariate analysis of variance (manova) for behavioral scientists. Practical Assessment, Research, and Evaluation, 19(1):17, 2014.
- Energy statistics: A class of statistics based on distances. Journal of statistical planning and inference, 143(8):1249–1272, 2013a.
- A kernel two-sample test. Journal of Machine Learning Research, 13(Mar):723–773, 2012.
- Consistent distribution-free k-sample and independence tests for univariate random variables. The Journal of Machine Learning Research, 17(1):978–1031, 2016.
- Disco analysis: A nonparametric extension of analysis of variance. The Annals of Applied Statistics, 4(2):1034–1055, 2010.
- Karl Pearson. Vii. note on regression and inheritance in the case of two parents. proceedings of the royal society of London, 58(347-352):240–242, 1895.
- Measuring and testing dependence by correlation of distances. The annals of statistics, 35(6):2769–2794, 2007.
- Brownian distance covariance. The annals of applied statistics, 3(4):1236–1265, 2009.
- The distance correlation t-test of independence in high dimension. Journal of Multivariate Analysis, 117:193–213, 2013b.
- Russell Lyons et al. Distance covariance in metric spaces. The Annals of Probability, 41(5):3284–3305, 2013.
- Kernel methods for measuring independence. Journal of Machine Learning Research, 6(Dec):2075–2129, 2005.
- Consistent nonparametric tests of independence. Journal of Machine Learning Research, 11(Apr):1391–1423, 2010.
- Kernel mean embedding of distributions: A review and beyond. Foundations and Trends® in Machine Learning, 10(1-2):1–141, 2017.
- A consistent multivariate test of association based on ranks of distances. Biometrika, 100(2):503–510, 2012.
- From distance correlation to multiscale graph correlation. Journal of the American Statistical Association, 115(529):280–291, 2020a.
- Discovering and deciphering relationships across disparate data modalities. eLife, 8:e41690, 2019.
- hyppo: A comprehensive multivariate hypothesis testing python package, 2019.
- Ball covariance: A generic measure of dependence in banach space. Journal of the American Statistical Association, 115(529):307–317, 2020.
- Learning interpretable characteristic kernels via decision forests, 2020b.
- Multi-level block permutation. Neuroimage, 123:253–268, 2015.
- Equivalence of distance-based and rkhs-based statistics in hypothesis testing. The Annals of Statistics, 41(5):2263–2291, 2013.
- The exact equivalence of distance and kernel methods in hypothesis testing. AStA Advances in Statistical Analysis, 105(3):385–403, 2021.
- Comment on detecting novel associations in large data sets. Unpublished (available at http://emotion. technion. ac. il/~ gorfinm/filesscience6. pdf on 11 Nov. 2012), 2012.
- Detecting novel associations in large data sets. science, 334(6062):1518–1524, 2011.
- fMRIPrep: a robust preprocessing pipeline for functional MRI. Nature Methods, 16(1):111–116, 1 2019. ISSN 1548-7105. 10.1038/s41592-018-0235-4. URL https://www.nature.com/articles/s41592-018-0235-4. Number: 1 Publisher: Nature Publishing Group.
- Machine learning for neuroimaging with scikit-learn. Frontiers in Neuroinformatics, 8, 2014. ISSN 1662-5196. 10.3389/fninf.2014.00014. URL https://www.frontiersin.org/articles/10.3389/fninf.2014.00014/full. Publisher: Frontiers.
- Enhancing reproducibility of fMRI statistical maps using generalized canonical correlation analysis in NPAIRS framework. NeuroImage, 60(4):1970–1981, 5 2012. ISSN 1095-9572. 10.1016/j.neuroimage.2012.01.137.
- J. R. Kettenring. Canonical Analysis of Several Sets of Variables. Biometrika, 58(3):433–451, 1971. ISSN 0006-3444. 10.2307/2334380. URL https://www.jstor.org/stable/2334380. Publisher: [Oxford University Press, Biometrika Trust].
- Generalized canonical correlation analysis for classification. Journal of Multivariate Analysis, 130:310–322, 2014.
- A method for making group inferences from functional MRI data using independent component analysis. Human Brain Mapping, 14(3):140–151, 2001. ISSN 1097-0193. 10.1002/hbm.1048. URL https://onlinelibrary.wiley.com/doi/abs/10.1002/hbm.1048. _eprint: https://onlinelibrary.wiley.com/doi/pdf/10.1002/hbm.1048.
- C. Bonferroni. Teoria statistica delle classi e calcolo delle probabilita. Pubblicazioni del R Istituto Superiore di Scienze Economiche e Commericiali di Firenze, 8:3–62, 1936. URL https://ci.nii.ac.jp/naid/20001561442.
- Gregory Carey. Multivariate analysis of variance (manova): I. theory. Retrieved May, 14:2011, 1998.
- ACÂ Rencher. Methods of multivariate analysis. DOI, 10(0471271357):66, 2002.
- Maurice S Bartlett. A note on tests of significance in multivariate analysis. In Mathematical Proceedings of the Cambridge Philosophical Society, volume 35, pages 180–185. Cambridge University Press, 1939.
- C Radhakrishna Rao. Tests of significance in multivariate analysis. Biometrika, 35(1/2):58–79, 1948.
- GÂ David Garson. Multivariate glm, manova, and mancova. Statnotes: Topics in multivariate analysis, 2009.
- Chester L Olson. On choosing a test statistic in multivariate analysis of variance. Psychological bulletin, 83(4):579, 1976.
- Chester Lewellyn Olson. A Monte Carlo investigation of the robustness of multivariate analysis of variance. PhD thesis, Thesis (Ph. D.)–University of Toronto, 1973.
- Theodore Micceri. The unicorn, the normal curve, and other improbable creatures. Psychological bulletin, 105(1):156, 1989.
- Stephen M Stigler. Do robust estimators work with real data? The Annals of Statistics, pages 1055–1098, 1977.
- Partial distance correlation with methods for dissimilarities. The Annals of Statistics, 42(6):2382–2412, 2014.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.