Azadkia-Chatterjee Coefficient

Updated 9 December 2025

The Azadkia–Chatterjee coefficient is a nonparametric, rank-based measure defined via conditional probability variance that ranges from 0 (independence) to 1 (functional dependence).
It employs graph-based estimators using nearest neighbor ranks to achieve strong consistency, parametric rates, and established asymptotic normality in both marginal and conditional settings.
Its extensions include multivariate responses and scale-invariant variants, making it central for independence testing, graphical models, and model-free variable selection.

The Azadkia–Chatterjee coefficient is a nonparametric, rank-based measure of directed dependence between a vector-valued predictor and a univariate or multivariate response, defined at the population level via the variance of conditional probabilities and estimated using nearest-neighbor graphs. It features an interpretable scale—zero under independence and one under functional dependence—and a graph-based empirical estimator that admits parametric rates, strong consistency, bandwidth-free implementation, and central limit theorems in both marginal and conditional versions. Multivariate extensions, scale-invariant variants, and connections to broader classes of geometric graph and kernel-based dependence measures position the coefficient as a central object for independence testing, graphical models, and model-free variable selection.

1. Definition and Fundamental Properties

Let $(X,Y)$ be jointly distributed random elements with $X\in\mathbb{R}^d$ and $Y$ either univariate or a vector in $\mathbb{R}^{q}$ . The Azadkia–Chatterjee (AC) coefficient for $Y$ on $X$ is defined by

$\xi(Y,X) \;=\; \frac{ \int_{\mathbb{R}} \operatorname{Var}\big( P(Y\ge y\mid X) \big) \, dP^Y(y) }{ \int_{\mathbb{R}} \operatorname{Var}\left( \mathbf{1}\{ Y \ge y \} \right) dP^Y(y) } \in [0,1].$

An equivalent form based on the cumulative distribution of $Y$ yields, for continuous $F_Y$ : $\xi(Y, X) = 6\int_\mathbb{R} \operatorname{Var}\left( P(Y\ge y \mid X) \right) dP^Y(y) - 2.$ Characterizing properties:

$X\in\mathbb{R}^d$ 0 if and only if $X\in\mathbb{R}^d$ 1 and $X\in\mathbb{R}^d$ 2 are independent.
$X\in\mathbb{R}^d$ 3 if and only if $X\in\mathbb{R}^d$ 4 is almost surely a measurable function of $X\in\mathbb{R}^d$ 5.

The definition is directional and scale-invariant: strictly increasing transformations of $X\in\mathbb{R}^d$ 6 or bijections of $X\in\mathbb{R}^d$ 7 preserve $X\in\mathbb{R}^d$ 8 (Ansari et al., 14 Mar 2025, Ansari et al., 2022). For conditional dependence, define $X\in\mathbb{R}^d$ 9 jointly and set

$Y$ 0

$Y$ 1 if and only if $Y$ 2, and $Y$ 3 if and only if $Y$ 4 is a function of $Y$ 5 given $Y$ 6 (Shi et al., 2021, Huang et al., 2020).

2. Graph-Based and Rank-Based Estimator Construction

For i.i.d. data $Y$ 7, construct the following graph-based estimator:

Compute the univariate ranks $Y$ 8.
Let $Y$ 9.
The empirical AC coefficient is

$\mathbb{R}^{q}$ 0

This estimator generalizes Chatterjee's original proposal to multivariate covariates $\mathbb{R}^{q}$ 1 by utilizing nearest-neighbor graphs in $\mathbb{R}^{q}$ 2 (Lin et al., 2022).

Multivariate response: For $\mathbb{R}^{q}$ 3, a "chain rule" or copula-based construction is used (Ansari et al., 2022, Huang et al., 8 Dec 2025): $\mathbb{R}^{q}$ 4 with $\mathbb{R}^{q}$ 5, reduces to $\mathbb{R}^{q}$ 6 for $\mathbb{R}^{q}$ 7, and can be strongly consistently estimated using graph-based estimators for each univariate constituent.

Scale invariance: The standard estimator is not invariant to affine changes in $\mathbb{R}^{q}$ 8; a fully scale-invariant version uses coordinatewise rank transforms in $\mathbb{R}^{q}$ 9 before constructing the NNG (Tran et al., 2024).

3. Distributional Properties and Limit Theory

Asymptotic Normality and Variance Bounds

The central limit theorem holds under broad conditions. For i.i.d. draws from a continuous law: $Y$ 0 whenever $Y$ 1 is not a measurable function of $Y$ 2 (Lin et al., 2022). The asymptotic variance $Y$ 3 satisfies: $Y$ 4 and, under absolute continuity of $Y$ 5, a sharper bound involving explicit dimension-dependent constants.

When $Y$ 6, $Y$ 7 with $Y$ 8, $Y$ 9 linked to the geometry of the NNG in $X$ 0 (Lin et al., 2022, Han et al., 2022). Under manifold support, the limiting variance depends solely on the intrinsic dimension.

A consistent explicit estimator of the variance is available, allowing for valid inference (Lin et al., 2022).

Symmetric and Conditional Extensions

A symmetrized version, taking $X$ 1, allows construction of two-sided tests—its limit law under independence is skew-normal with explicit variance (Zhang, 2022).

The conditional AC coefficient admits an empirical estimator with parallel asymptotics; under independence, $X$ 2 is asymptotically normal with variance determined by the dimensions of the variables and graph-count statistics (Shi et al., 2021).

Continuity Considerations

Unlike classical measures (Spearman's rho, Kendall's tau), $X$ 3 is not weakly continuous under distributional convergence. Instead, it is continuous with respect to convergence of Markov products (pairs of conditionally i.i.d. copies) under additional marginal quantile convergence or specific copula convergence. Practical families and models (ellipticals, Archimedeans, noises) satisfy required continuity, so stable large-sample inference is possible within these classes (Ansari et al., 14 Mar 2025).

4. Algorithmic and Computational Aspects

Nearest-neighbor graph construction can be done in $X$ 4 (brute force for small $X$ 5; kd-trees or approximate methods for larger $X$ 6).
Rank computations for $X$ 7 (and optionally for $X$ 8 in the scale-invariant version) cost $X$ 9 per coordinate.
Multivariate response: Efficient merge-sort or divide-and-conquer algorithms exist for blockwise rank counts, with time complexity $\xi(Y,X) \;=\; \frac{ \int_{\mathbb{R}} \operatorname{Var}\big( P(Y\ge y\mid X) \big) \, dP^Y(y) }{ \int_{\mathbb{R}} \operatorname{Var}\left( \mathbf{1}\{ Y \ge y \} \right) dP^Y(y) } \in [0,1].$ 0 (Huang et al., 8 Dec 2025).
For each observation $\xi(Y,X) \;=\; \frac{ \int_{\mathbb{R}} \operatorname{Var}\big( P(Y\ge y\mid X) \big) \, dP^Y(y) }{ \int_{\mathbb{R}} \operatorname{Var}\left( \mathbf{1}\{ Y \ge y \} \right) dP^Y(y) } \in [0,1].$ 1, nearest-neighbor search and rank calculations admit nearly linear scaling, enabling use in large datasets.

5. Connections to Broader Dependence Measures

The AC coefficient is a specific instance within the family of graph–RKHS–OT dependency measures (Deb et al., 2020, Deb et al., 2024):

Population level: For sufficiently rich kernels (e.g., the min kernel on $\xi(Y,X) \;=\; \frac{ \int_{\mathbb{R}} \operatorname{Var}\big( P(Y\ge y\mid X) \big) \, dP^Y(y) }{ \int_{\mathbb{R}} \operatorname{Var}\left( \mathbf{1}\{ Y \ge y \} \right) dP^Y(y) } \in [0,1].$ 2, or the indicator-integral kernel), the corresponding normalized conditional MMD directly recovers $\xi(Y,X) \;=\; \frac{ \int_{\mathbb{R}} \operatorname{Var}\big( P(Y\ge y\mid X) \big) \, dP^Y(y) }{ \int_{\mathbb{R}} \operatorname{Var}\left( \mathbf{1}\{ Y \ge y \} \right) dP^Y(y) } \in [0,1].$ 3.
Sample level: The estimator is a geometric graph functional over empirical OT ranks.
Distribution-free: Under the null of independence, the law of the AC coefficient (when computed using empirical OT ranks and graph structure) is exactly permutation invariant, enabling finite-sample calibration for independence tests.

Multivariate extensions (both in predictors and responses) and conditional variants fit naturally into this graph–kernel framework, relating directly to kernel partial correlation (Huang et al., 2020), distance multivariance, and more general measures indexed by RKHS (Deb et al., 2024).

6. Practical Application Domains

Independence and Conditional Independence Testing

The AC coefficient and its conditional extension are used for:

Testing independence in arbitrary dimensions (direct, distribution-free under the null, with consistent critical values).
Conditional independence testing, e.g., through graph-based statistics evaluated with (conditional) randomization tests (Shi et al., 2021). However, these are known to exhibit low local power against contiguous local alternatives unless the nearest-neighbor graph is appropriately generalized or replaced with $\xi(Y,X) \;=\; \frac{ \int_{\mathbb{R}} \operatorname{Var}\big( P(Y\ge y\mid X) \big) \, dP^Y(y) }{ \int_{\mathbb{R}} \operatorname{Var}\left( \mathbf{1}\{ Y \ge y \} \right) dP^Y(y) } \in [0,1].$ 4-NN approaches.

Graphical Model Structure Learning

Pairwise conditional AC coefficients are used as entries in adjacency matrices for learning undirected graphs representing conditional independence relationships in high dimensions, outperforming standard penalized Gaussian graphical model approaches in various regimes (Furmańczyk, 2023).

Model-Free Feature Selection and Network Analysis

The multivariate T extension and its estimator enable:

Directional, scale-invariant variable selection in high-dimensional regression settings (Ansari et al., 2022, Ansari et al., 14 Mar 2025).
Ranking and forward feature selection for multivariate outcomes, with no tuning parameters and explicit stopping rules.
Directed network inference in financial, biological, and climatological data (Ansari et al., 2022).

7. Theoretical Limitations and Open Problems

Under local parametric or minimax-detection boundary alternatives, the standard 1-NN estimator is asymptotically powerless unless graph construction is strengthened (increasing $\xi(Y,X) \;=\; \frac{ \int_{\mathbb{R}} \operatorname{Var}\big( P(Y\ge y\mid X) \big) \, dP^Y(y) }{ \int_{\mathbb{R}} \operatorname{Var}\left( \mathbf{1}\{ Y \ge y \} \right) dP^Y(y) } \in [0,1].$ 5 with $\xi(Y,X) \;=\; \frac{ \int_{\mathbb{R}} \operatorname{Var}\big( P(Y\ge y\mid X) \big) \, dP^Y(y) }{ \int_{\mathbb{R}} \operatorname{Var}\left( \mathbf{1}\{ Y \ge y \} \right) dP^Y(y) } \in [0,1].$ 6) (Shi et al., 2021).
Weak continuity of $\xi(Y,X) \;=\; \frac{ \int_{\mathbb{R}} \operatorname{Var}\big( P(Y\ge y\mid X) \big) \, dP^Y(y) }{ \int_{\mathbb{R}} \operatorname{Var}\left( \mathbf{1}\{ Y \ge y \} \right) dP^Y(y) } \in [0,1].$ 7 fails under convergence in law, but holds under stricter Markov-product and copula-derivative types of convergence, implying care is needed in statistical inference (Ansari et al., 14 Mar 2025).
In practical high-dimensional settings, the curse of dimensionality in nearest-neighbor search may be partially circumvented due to intrinsic dimension adaptivity, but further analysis on computational–statistical tradeoffs remains ongoing (Han et al., 2022).

References:

(Lin et al., 2022) Limit theorems of Chatterjee's rank correlation
(Zhang, 2022) On the asymptotic distribution of the symmetrized Chatterjee's correlation coefficient
(Shi et al., 2021) On Azadkia-Chatterjee's conditional dependence coefficient
(Tran et al., 2024) On a rank-based Azadkia-Chatterjee correlation coefficient
(Huang et al., 8 Dec 2025) A multivariate extension of Azadkia-Chatterjee's rank coefficient
(Ansari et al., 2022) A direct extension of Azadkia & Chatterjee's rank correlation to multi-response vectors
(Han et al., 2022) Azadkia-Chatterjee's correlation coefficient adapts to manifold data
(Deb et al., 2020) Measuring Association on Topological Spaces Using Kernels and Geometric Graphs
(Huang et al., 2020) Kernel Partial Correlation Coefficient -- a Measure of Conditional Dependence
(Furmańczyk, 2023) A construction of a graphical model
(Ansari et al., 14 Mar 2025) On continuity of Chatterjee's rank correlation and related dependence measures
(Deb et al., 2024) Distribution-free Measures of Association based on Optimal Transport