Chatterjee's Rank Correlation Tests

Updated 9 October 2025

The paper introduces Chatterjee’s rank correlation, a nonparametric measure that is 0 for independent variables and 1 when one variable is an almost sure function of the other, offering an unbiased estimator and asymptotic normality under the null.
It employs efficient O(n log n) computation with rank-order differences and bias corrections, using the m-out-of-n bootstrap to ensure valid inference especially in non-i.i.d. or noncontinuous contexts.
The test framework is extended to multivariate and high-dimensional settings through nearest-neighbor graphs and spectral analysis, with boosted and combined methods to improve power in detecting complex dependencies.

Chatterjee's rank correlation-based tests constitute a family of nonparametric and distribution-free procedures for quantifying and detecting dependencies between random variables, with particular emphasis on directed (functional) rather than symmetric association. Unlike classical rank-based measures such as Spearman's rho or Kendall's tau, Chatterjee's correlation $\xi$ is tailored to measure the strength of functional dependence: $\xi=0$ if and only if variables are independent, and $\xi=1$ if and only if one is an (almost sure) measurable function of the other. Since its introduction, the theory, properties, extensions, comparative behavior, implementation protocols, and inferential implications of $\xi$ have been the subject of intensive research across probability, statistics, and applied computational mathematics.

1. Definition, Population Properties, and Computable Form

Chatterjee’s rank correlation is formally defined for a pair of continuous random variables (or via their copula $C$ ) as

$\xi(C) = 6 \int_0^1 \int_0^1 \left[\partial_1 C(u,v)\right]^2 u v \, du \, dv - 2$

where $\partial_1 C(u,v)$ is the density of $C$ with respect to the first argument. It can equivalently be interpreted in terms of conditional variances, as

$\xi(X, Y) = \frac{ \int_\mathbb{R} \operatorname{Var}\left(\mathbb{E}[1_{Y \geq y} \mid X]\right) dP_Y(y) }{ \int_\mathbb{R} \operatorname{Var}[1_{Y \geq y}] dP_Y(y) }$

For independent $(X, Y)$ , $\xi=0$ 0; for perfect functional dependence $\xi=0$ 1, $\xi=0$ 2. Intermediate values correspond to intermediate degrees of predictability, but $\xi=0$ 3 does not linearize the notion of strength of association. In sample computations, an unbiased estimator is

$\xi=0$ 4

where $\xi=0$ 5 indexes $\xi=0$ 6-ordered samples and $\xi=0$ 7 are the concomitant ranks of $\xi=0$ 8. This computation can be performed in $\xi=0$ 9 time for continuous data with no ties.

2. Inference: Asymptotic Laws, Bootstrap, and Bias Correction

For i.i.d. data and continuous margins, the normalized statistic is asymptotically normal under the null,

$\xi=1$ 0

as established via Stein’s method and projection representations (Auddy et al., 2021, Lin et al., 2022, Kroll, 2024). General consistency and asymptotic normality extend to non-i.i.d. (strongly mixing) data given suitable assumptions, after a vanishing bias correction. Bootstrap inference for $\xi=1$ 1 must employ the $\xi=1$ 2-out-of- $\xi=1$ 3 bootstrap, not $\xi=1$ 4-out-of- $\xi=1$ 5, to ensure valid coverage and limit approximation, especially with noncontinuous data or under dependence (Dette et al., 2023, Dalitz et al., 2023). Simple normalization of $\xi=1$ 6 by its finite-sample upper bound further reduces finite-sample bias.

3. Power, Detection Boundaries, and Comparison with Competing Methods

Chatterjee's test is consistent and distribution-free under arbitrary alternatives, but it is rate sub-optimal for local (contiguous) alternatives relative to independence: the detection threshold against local parametric alternatives scales as $\xi=1$ 7 (e.g., for local correlations $\xi=1$ 8 with $\xi=1$ 9) (Shi et al., 2020, Auddy et al., 2021). Competing methods such as Hoeffding’s $\xi$ 0, Blum–Kiefer–Rosenblatt’s $\xi$ 1, or Bergsma–Dassios–Yanagimoto’s $\xi$ 2 achieve the $\xi$ 3 detection boundary and are thus preferred for local weak alternatives. Nevertheless, for testing non-trivial levels of association (e.g., distinguishing $\xi$ 4 vs $\xi$ 5), the Chatterjee-based test is minimax optimal with $\xi$ 6 at the $\xi$ 7 rate (Auddy et al., 2021).

Recent theoretical and algorithmic advances have addressed this shortcoming:

Boosted versions that aggregate information over $\xi$ 8 right-nearest neighbors, $\xi$ 9, can achieve near-parametric detection boundaries, especially when $C$ 0 is scaled appropriately with $C$ 1 (Lin et al., 2021).
Combined tests (e.g., $C$ 2 with $C$ 3 Spearman's rank correlation) yield uniform type I control and substantially improved power across monotonic and nonmonotonic alternatives (Zhang, 2023, Zhang, 2024).

4. Multivariate and High-Dimensional Extensions

Extensions of Chatterjee's methodology to multivariate (vector-valued) responses and predictors have been developed at two levels:

Azadkia–Chatterjee correlation: Employs nearest-neighbor graph constructions for multivariate variables. Recent versions use rank-based nearest-neighbor graphs to guarantee scale invariance, consistency, and asymptotic normality (Tran et al., 2024).
Multi-response dependence measure $C$ 4: For multivariate responses $C$ 5 and predictors $C$ 6, the measure

$C$ 7

generalizes $C$ 8 and is permutation-invariant under certain symmetrizations. The corresponding estimator $C$ 9 is strongly consistent and asymptotically normal (Ansari et al., 2022).

Tests of joint and complete independence: In high dimensions, quadratic sum statistics and extreme-value type statistics based on pairwise $\xi(C) = 6 \int_0^1 \int_0^1 \left[\partial_1 C(u,v)\right]^2 u v \, du \, dv - 2$ 0 are developed for composite testing, enhanced with variable screening to address dense and sparse alternative regimes (Xia et al., 2024, Olivares et al., 27 Mar 2025). In all cases, the null distribution is derivable or accurately estimated by block-multiplier or $\xi(C) = 6 \int_0^1 \int_0^1 \left[\partial_1 C(u,v)\right]^2 u v \, du \, dv - 2$ 1-out-of- $\xi(C) = 6 \int_0^1 \int_0^1 \left[\partial_1 C(u,v)\right]^2 u v \, du \, dv - 2$ 2 bootstraps.

A further generalization employs the distance-based Chatterjee correlation, where data are mapped to real-valued “distance transformed” representations (Szekely et al. transformation), allowing $\xi(C) = 6 \int_0^1 \int_0^1 \left[\partial_1 C(u,v)\right]^2 u v \, du \, dv - 2$ 3-type dependence measurement and causal inference for general multivariate or even complex-valued data (Pascual-Marqui et al., 2024).

5. Mathematical Relations to Classical Rank Correlations and Structural Constraints

Chatterjee's $\xi(C) = 6 \int_0^1 \int_0^1 \left[\partial_1 C(u,v)\right]^2 u v \, du \, dv - 2$ 4 has fundamentally different structural and functional properties compared to Spearman’s $\xi(C) = 6 \int_0^1 \int_0^1 \left[\partial_1 C(u,v)\right]^2 u v \, du \, dv - 2$ 5 and Spearman’s footrule $\xi(C) = 6 \int_0^1 \int_0^1 \left[\partial_1 C(u,v)\right]^2 u v \, du \, dv - 2$ 6:

For continuous bivariate copulas $\xi(C) = 6 \int_0^1 \int_0^1 \left[\partial_1 C(u,v)\right]^2 u v \, du \, dv - 2$ 7, $\xi(C) = 6 \int_0^1 \int_0^1 \left[\partial_1 C(u,v)\right]^2 u v \, du \, dv - 2$ 8 is a quadratic functional of the copula's derivative, in contrast to $\xi(C) = 6 \int_0^1 \int_0^1 \left[\partial_1 C(u,v)\right]^2 u v \, du \, dv - 2$ 9 or $\partial_1 C(u,v)$ 0 which are linear functionals.
The attainable $\partial_1 C(u,v)$ 1 region over all (or stochastically increasing/decreasing) copulas is convex, with boundaries defined by a family of piecewise-linear, absolutely continuous, asymmetric copulas (Ansari et al., 18 Jun 2025). For stochastically monotonic copulas,

$\partial_1 C(u,v)$ 2

always holds. The maximal possible gap $\partial_1 C(u,v)$ 3 is $\partial_1 C(u,v)$ 4 for a specific copula with $\partial_1 C(u,v)$ 5, $\partial_1 C(u,v)$ 6.

The region $\partial_1 C(u,v)$ 7 for stochastically increasing copulas is exactly $\partial_1 C(u,v)$ 8 (Rockel, 8 Sep 2025). The upper bound is uniquely achieved by the Fréchet copula, and the lower bound is approached by a newly constructed two-parameter copula family.
The Markov product of the copula and its transpose provides the essential link: $\partial_1 C(u,v)$ 9, connecting $C$ 0 to $C$ 1 via copula operations.

6. Spectral Analysis and Random Matrix Behavior

For large random vectors of independent variables, the empirical spectral distribution (ESD) of the symmetrized Chatterjee rank correlation matrix converges to the Wigner semicircle law, rather than the Marchenko–Pastur law found for Pearson, Spearman, or Kendall matrices (Dong et al., 8 Oct 2025). This deviation marks the first example of such a phenomenon among correlation matrices and is foundational for developing CLTs for linear spectral statistics and for tests of high-dimensional complete independence based on eigenvalue distributions.

7. Controversies and Statistical Limitations

A key theoretical limitation of Chatterjee’s $C$ 2 is its lack of weak continuity (Bücher et al., 2024). That is, for any $C$ 3 with continuous margins, one can approximate $C$ 4 arbitrarily closely (in distribution) by random pairs for which $C$ 5, and thus for any $C$ 6, there exists a convergent sequence with constant $C$ 7. Therefore,

No test based solely on $C$ 8 can have nontrivial power uniformly separating $C$ 9 from $\xi(X, Y) = \frac{ \int_\mathbb{R} \operatorname{Var}\left(\mathbb{E}[1_{Y \geq y} \mid X]\right) dP_Y(y) }{ \int_\mathbb{R} \operatorname{Var}[1_{Y \geq y}] dP_Y(y) }$ 0; in particular, tests based on smooth functions of the empirical statistic have trivial power against such "perfect dependence yet nearby" alternatives.
Uniform asymptotic confidence intervals for $\xi(X, Y) = \frac{ \int_\mathbb{R} \operatorname{Var}\left(\mathbb{E}[1_{Y \geq y} \mid X]\right) dP_Y(y) }{ \int_\mathbb{R} \operatorname{Var}[1_{Y \geq y}] dP_Y(y) }$ 1 must necessarily degenerate, covering the entire interval $\xi(X, Y) = \frac{ \int_\mathbb{R} \operatorname{Var}\left(\mathbb{E}[1_{Y \geq y} \mid X]\right) dP_Y(y) }{ \int_\mathbb{R} \operatorname{Var}[1_{Y \geq y}] dP_Y(y) }$ 2 with high probability.
This impossibility is not a defect particular to $\xi(X, Y) = \frac{ \int_\mathbb{R} \operatorname{Var}\left(\mathbb{E}[1_{Y \geq y} \mid X]\right) dP_Y(y) }{ \int_\mathbb{R} \operatorname{Var}[1_{Y \geq y}] dP_Y(y) }$ 3 but is fundamental to any rank-based association measure that attains $\xi(X, Y) = \frac{ \int_\mathbb{R} \operatorname{Var}\left(\mathbb{E}[1_{Y \geq y} \mid X]\right) dP_Y(y) }{ \int_\mathbb{R} \operatorname{Var}[1_{Y \geq y}] dP_Y(y) }$ 4 only for measurable functions.

These facts emphasize the importance of careful interpretation, the necessity for alternative or combined inferential strategies in certain regimes, and motivate the ongoing research in boosting, combining, and generalizing Chatterjee’s framework for practical and theoretical robustness across varied dependence structures.

Table: Comparative Aspects of Chatterjee’s Rank Correlation and Related Quantities

Quantity	Null Limiting Law	Power (Local Alt.)	Maximal Value	Functional Target
$\xi(X, Y) = \frac{ \int_\mathbb{R} \operatorname{Var}\left(\mathbb{E}[1_{Y \geq y} \mid X]\right) dP_Y(y) }{ \int_\mathbb{R} \operatorname{Var}[1_{Y \geq y}] dP_Y(y) }$ 5 (Chatterjee)	Normal, variance $\xi(X, Y) = \frac{ \int_\mathbb{R} \operatorname{Var}\left(\mathbb{E}[1_{Y \geq y} \mid X]\right) dP_Y(y) }{ \int_\mathbb{R} \operatorname{Var}[1_{Y \geq y}] dP_Y(y) }$ 6	rate sub-optimal ( $\xi(X, Y) = \frac{ \int_\mathbb{R} \operatorname{Var}\left(\mathbb{E}[1_{Y \geq y} \mid X]\right) dP_Y(y) }{ \int_\mathbb{R} \operatorname{Var}[1_{Y \geq y}] dP_Y(y) }$ 7)	$\xi(X, Y) = \frac{ \int_\mathbb{R} \operatorname{Var}\left(\mathbb{E}[1_{Y \geq y} \mid X]\right) dP_Y(y) }{ \int_\mathbb{R} \operatorname{Var}[1_{Y \geq y}] dP_Y(y) }$ 8	functional dependence ( $\xi(X, Y) = \frac{ \int_\mathbb{R} \operatorname{Var}\left(\mathbb{E}[1_{Y \geq y} \mid X]\right) dP_Y(y) }{ \int_\mathbb{R} \operatorname{Var}[1_{Y \geq y}] dP_Y(y) }$ 9)
$(X, Y)$ 0 (Spearman)	Non-normal (degenerate)	rate-optimal ( $(X, Y)$ 1)	$(X, Y)$ 2 (monotone)	concordance (monotonic)
$(X, Y)$ 3, $(X, Y)$ 4, $(X, Y)$ 5	Non-normal (degenerate)	rate-optimal ( $(X, Y)$ 6)	less than $(X, Y)$ 7	concordance/symmetric dependence