A Uniform Concentration Inequality for Kernel-Based Two-Sample Statistics
Abstract: In many contemporary statistical and machine learning methods, one needs to optimize an objective function that depends on the discrepancy between two probability distributions. The discrepancy can be referred to as a metric for distributions. Widely adopted examples of such a metric include Energy Distance (ED), distance Covariance (dCov), Maximum Mean Discrepancy (MMD), and the Hilbert-Schmidt Independence Criterion (HSIC). We show that these metrics can be unified under a general framework of kernel-based two-sample statistics. This paper establishes a novel uniform concentration inequality for the aforementioned kernel-based statistics. Our results provide upper bounds for estimation errors in the associated optimization problems, thereby offering both finite-sample and asymptotic performance guarantees. As illustrative applications, we demonstrate how these bounds facilitate the derivation of error bounds for procedures such as distance covariance-based dimension reduction, distance covariance-based independent component analysis, MMD-based fairness-constrained inference, MMD-based generative model search, and MMD-based generative adversarial networks.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.