Interpretable Distribution Features with Maximum Testing Power

Published 22 May 2016 in stat.ML and cs.LG | (1605.06796v2)

Abstract: Two semimetrics on probability distributions are proposed, given as the sum of differences of expectations of analytic functions evaluated at spatial or frequency locations (i.e, features). The features are chosen so as to maximize the distinguishability of the distributions, by optimizing a lower bound on test power for a statistical test using these features. The result is a parsimonious and interpretable indication of how and where two distributions differ locally. An empirical estimate of the test power criterion converges with increasing sample size, ensuring the quality of the returned features. In real-world benchmarks on high-dimensional text and image data, linear-time tests using the proposed semimetrics achieve comparable performance to the state-of-the-art quadratic-time maximum mean discrepancy test, while returning human-interpretable features that explain the test results.

Abstract PDF Upgrade to Chat

Citations (124)

View on Semantic Scholar

Summary

Interpretable Features for Distribution Testing with Enhanced Power

The paper authored by Wittawat Jitkrittum, Zoltán Szabó, Kacper Chwialkowski, and Arthur Gretton introduces two novel semimetrics on probability distributions designed to optimize the power of statistical tests for distinguishing between them. These semimetrics utilize the differences in expectations of predefined analytic functions, evaluated at specific spatial or frequency features determined by maximizing a lower bound on test power. The focus is on providing interpretable results that highlight the local distinctions between distributions.

Core Contributions

Semimetrics Design: The paper proposes two semimetrics, one based on spatial features (ME test) and the other on frequency domain features (SCF test). Both are tailored to enhance test power by selecting features that optimize a derived lower bound on test power.
Empirical Convergence: A key theoretical underpinning is the empirical convergence of the feature selection method’s test power. This guarantees that as sample size increases, the selected features remain stable in their ability to distinguish between distributions.
Efficiency and Interpretability: The proposed methods operate in linear time, achieving parity with more computationally intensive approaches like the quadratic-time maximum mean discrepancy (MMD) test. Importantly, they yield interpretable features, enhancing the usability of the results in practical applications.

Experimental Validation

On high-dimensional datasets, including text and image data, the tests were shown to be competitive in power with state-of-the-art methods, while also offering the advantage of interpretability.
Specific experiments demonstrated how these methods could identify discriminative features in datasets where other methods fail to provide insights into the nature of distribution differences.

Theoretical Contributions

The paper proves that their empirical feature selection process asymptotically converges to a uniform distribution over the space of Gaussian kernels, ensuring consistent test power improvements as sample sizes grow. Moreover, a detailed analysis using Hotelling’s T-squared statistics underpins the lower bound on test power, which is central to the optimization process.

Practical Implications

The methodologies presented have broad applications in domains requiring model validation and comparison, such as machine learning algorithms where interpretability and computational efficiency are paramount. Future work could extend this approach to more complex, non-Gaussian kernels, potentially broadening the applicability of the test.

Future Directions

Looking forward, further exploration could include:
- Extending the semimetrics to non-Gaussian kernels to assess their suitability across a broader range of applications.
- Investigating the automatic selection of Gaussian width and test location initialization to reduce the need for manual parameter tuning.
- Applying the methodology in more complex two-sample testing scenarios such as dynamic or evolving distributions.

This study stands as a substantial contribution to the statistical toolbox, offering significant improvements in both efficiency and interpretability for distribution testing, with direct applicability across various data-rich scientific fields.