Logarithmic Quantile Estimation for Rank Statistics

Published 10 May 2013 in stat.ME | (1305.2250v1)

Abstract: We prove an almost sure weak limit theorem for simple linear rank statistics for samples with continuous distributions functions. As a corollary the result extends to samples with ties, and the vector version of an a.s. central limit theorem for vectors of linear rank statistics. Moreover, we derive such a weak convergence result for some quadratic forms. These results are then applied to quantile estimation, and to hypothesis testing for nonparametric statistical designs, here demonstrated by the c-sample problem, where the samples may be dependent. In general, the method is known to be comparable to the bootstrap and other nonparametric methods (\cite{THA, FRI}) and we confirm this finding for the c-sample problem.

Abstract PDF Upgrade to Chat

Summary

The paper establishes an almost sure central limit theorem for simple linear rank statistics, extending classical assumptions to accommodate dependent data.
The paper introduces LQE as a robust alternative to bootstrap, offering consistent quantile estimation without resampling, even under heterogeneous conditions.
The paper applies the methodology to the c-sample problem, demonstrating through simulations that LQE reliably approximates null distributions for both independent and dependent samples.

Logarithmic Quantile Estimation for Rank Statistics: Theory, Implementation, and Empirical Evaluation

Introduction

This work rigorously investigates the almost sure central limit theorem (ASCLT) for simple linear rank statistics based on samples from continuous distributions and advances the application of logarithmic quantile estimation (LQE) for nonparametric statistical inference. LQE, relying on logarithmic averaging rather than resampling, is positioned as a practical and often superior alternative to procedures such as the bootstrap, especially in cases involving dependent or non-identically distributed data. The theoretical contributions are extended to quadratic forms, and the methodology is substantiated in hypothesis testing scenarios with particular attention to the $c$ -sample problem with potentially dependent samples (1305.2250).

Almost Sure Central Limit Theorems for Rank Statistics

The paper's pivotal result is an ASCLT for simple linear rank statistics, i.e., statistics of the form $T_n(J)$ constructed from arbitrary, possibly dependent arrays of random vectors with continuous marginals. The authors carefully establish model conditions, notably relaxing traditional independence and identical distribution assumptions, thus broadening applicability to complex experimental designs (e.g., repeated measures and time series).

Specifically, the theorem covers:

Rank statistics with general regression constants and bounded second-derivative score functions $J$ .
Scenarios where the dimension of vectors may vary with the sample and dependencies are present among coordinates.
Situations with ties (via midranks), by referencing extensions in the literature.

The proof technique leverages a decomposition of the statistic, variant control of terms through variance and maximal dimension bounds, and detailed use of the Borel-Cantelli lemma for almost sure convergence. These allow for robust ASCLT statements under mild regularity conditions.

Logarithmic Quantile Estimation (LQE): Methodological Impact

LQE is shown to yield empirical quantiles constructed via logarithmic averages of observed statistics rather than the conventional empirical cumulative distribution function or simulated bootstraps. The main properties highlighted include:

Direct inference from sample data without resampling or explicit estimation of limiting variance/covariance structures.
Consistency in quantile estimation, with formal coverage probability guarantees.
Applicability to complex designs where asymptotic covariances are degenerate, unknown, or intractable.

Compared to bootstrap, LQE does not rely on independence or identical distribution assumptions and is robust to sample dependence—conditions under which the bootstrap often fails or is difficult to justify.

Application to the $c$ -Sample Problem

A salient application is the nonparametric $c$ -sample problem, with possible dependence among samples—a context where Kruskal-Wallis statistics are classically used but limiting distributions are unknown, especially for dependent samples.

Key results include:

Formulating the Kruskal-Wallis statistic as a quadratic form of vector-valued ASCLT statistics, even under dependent sampling schemes.
Establishing an ASCLT for these quadratic forms and thereby justifying LQE-based quantile inference irrespective of the intricacy of sample dependence.
Demonstrating that the limiting distribution of the Kruskal-Wallis statistic is inaccessible. Yet, LQE enables valid hypothesis testing by empirical quantiles, circumventing challenging covariance estimation.

Simulation Studies and Empirical Results

The simulations conducted substantiate the distributional stability and practical superiority of LQE over asymptotic and bootstrap-based competitors. The results for the three-sample (independent and dependent) case illustrate several notable findings:

For independent samples, empirical LQE quantiles closely approximate the known asymptotic chi-squared quantiles of the Kruskal-Wallis statistic, with better empirical coverage probabilities.
For dependent samples, where classical Kruskal-Wallis approaches do not yield tractable null distributions, LQE methods maintain nominal type I error rates and competitive power.
The method remains robust across sample sizes, distribution types (e.g., normal vs. exponential), and levels of dependence.

The simulation framework relies on thorough randomization and permutation strategies to address the impact of non-symmetry and potential order effects.

Theoretical and Practical Implications

The results have several direct implications for nonparametric statistical inference:

LQE extends the toolbox for hypothesis testing in environments where reliance on limiting distributions or variance components is problematic.
The framework supports applications in high-dimensional, dependent, and heterogeneously structured data, as encountered in modern scientific studies.
The approach is invariant to the precise characterization of dependence and remains valid under complex sampling schemes, which is particularly salient for robust statistical design in practical contexts.

On a theoretical level, this work consolidates the use of ASCLT in the derivation of test statistics' distributions under minimal assumptions, providing a solid foundation for further exploration of almost sure methods in statistics.

Future Directions

The methodology lays the groundwork for generalizations to more elaborate rank-based models, multivariate settings, and non-standard experimental designs. Future developments may include:

Extension to broader classes of statistics beyond simple linear forms, such as U-statistics and generalized rank scores.
Integration with recent advances in high-dimensional nonparametric inference and dependence modeling.
Automated or adaptive permutation procedures for quantile estimation in large-scale or streaming data settings.

Conclusion

This paper establishes the theoretical foundation and empirical efficacy of logarithmic quantile estimation for rank statistics, particularly under weak assumptions concerning independence and identical distribution. The LQE methodology unifies almost sure limit theory with practical quantile-based inference and presents a viable, often preferable alternative to bootstrap—especially in nonparametric designs with complex dependence. These results are integral for robust hypothesis testing in modern applied statistics (1305.2250).

Markdown Report Issue