On the epsilon-delta Structure Underlying Chatterjee's Rank Correlation

Published 13 Dec 2025 in math.ST, math.PR, and stat.ME | (2512.12363v1)

Abstract: We provide an epsilon-delta interpretation of Chatterjee's rank correlation by tracing its origin to a notion of local dependence between random variables. Starting from a primitive epsilon-delta construction, we show that rank-based dependence measures arise naturally as epsilon to zero limits of local averaging procedures. Within this framework, Chatterjee's rank correlation admits a transparent interpretation as an empirical realization of a local L1 residual. We emphasize that the probability integral transform plays no structural role in the underlying epsilon-delta mechanism, and is introduced only as a normalization step that renders the final expression distribution-free. We further consider a moment-based analogue obtained by replacing the absolute deviation with a squared residual. This L2 formulation is independent of rank transformations and, under a Gaussian assumption, recovers Pearson's coefficient of determination.

Abstract PDF Upgrade to Chat

Summary

The paper establishes a rigorous epsilon–delta foundation for Chatterjee’s rank correlation, elucidating its intrinsic local dependence through limit operations.
It demonstrates that local residuals, captured via L1 averaging, precisely distinguish functional dependence, independence, and Gaussian relations.
The study bridges nonparametric local explainability with global R² measures, offering insights for advanced high-dimensional dependence analysis.

Epsilon–Delta Foundations of Chatterjee’s Rank Correlation

Introduction and Motivation

The paper establishes a structural understanding of Chatterjee's rank correlation by grounding it in a rigorous $\varepsilon$ – $\delta$ framework for statistical dependence. Departing from traditional emphasis on distribution-free and robustness properties, the study clarifies that rank-based dependence statistics arise not as artifacts of normalization, but as intrinsic limits of localized averaging procedures in the metric geometry of $(X, Y)$ pairs. This local perspective yields a transparent empirical realization: Chatterjee’s rank correlation quantifies the $L^1$ local residual in how $Y$ varies with infinitesimal perturbations of $X$ .

Primitive $\varepsilon$ – $\delta$ Construction of Local Dependence

The construction begins by considering, for paired random variables $(X, Y)$ , the distribution of $|Y - Y'|$ when $\delta$ 0 is restricted to lie within a $\delta$ 1 neighborhood of $\delta$ 2. This primitive set-valued object $\delta$ 3 encodes the spread of $\delta$ 4 under fine local fluctuations of $\delta$ 5. For a sample, this yields a symmetric matrix $\delta$ 6 of localized $\delta$ 7 differences, indexed by those $\delta$ 8 where $\delta$ 9. Aggregation across neighborhoods and final scalarization yield empirical estimators of local dependency strength at resolution $(X, Y)$ 0.

As $(X, Y)$ 1, the neighborhoods shrink to nearest neighbors, and in finite samples, the mechanism amounts to averaging adjacent differences in $(X, Y)$ 2 as ordered by $(X, Y)$ 3. Thus, the essential skeleton of rank-based dependence and nonparametric monotonicity detection arises fundamentally from local limit operations, not from subsequent rank normalization.

Role of the Probability Integral Transform and Norm-Invariant Normalization

The probability integral transform (PIT) and rank statistics are shown to be secondary to the $(X, Y)$ 4– $(X, Y)$ 5 mechanism. Although PIT enables the computation of distribution-free coefficients by mapping arbitrary $(X, Y)$ 6 and $(X, Y)$ 7 marginal distributions to uniform variates $(X, Y)$ 8, it does not alter the local residual structure. The underlying local neighborhood process is invariant under strictly increasing transformations, and normalization to unit interval marginals serves only to place dependence coefficients on a comparable scale across data sets.

Chatterjee’s Rank Correlation as a Local $(X, Y)$ 9 Residual

Chatterjee’s correlation is reformulated as a limit object: for samples $L^1$ 0 with uniform marginals, one evaluates the deviation of $L^1$ 1 from its average within an $L^1$ 2-neighborhood in $L^1$ 3, then averages the absolute deviation over all points. This $L^1$ 4 local residual converges, as $L^1$ 5, to the expected value of $L^1$ 6, the population-level local mean deviation. Chatterjee’s statistic is then a simple affine transformation of this quantity, taking the form

$L^1$ 7

where the normalization renders the coefficient in $L^1$ 8 under the uniform setting. Complete determinism ( $L^1$ 9 a function of $Y$ 0) and independence are perfectly distinguished: $Y$ 1 and $Y$ 2, respectively.

Contrasting Spearman’s $Y$ 3 and Kendall’s $Y$ 4, which summarize global monotonic alignment, Chatterjee's statistic is strictly tied to local explainability. It quantifies the magnitude by which $Y$ 5 can be locally regressed on $Y$ 6 with arbitrarily small error, rather than aggregating rankings across all pairs.

Moment-Based ( $Y$ 7) Analogue and Link to Pearson’s $Y$ 8

Replacing the absolute deviation with a squared residual yields a moment-based (global) measure: $Y$ 9 which quantifies explained versus unexplained variance in $X$ 0 conditional on $X$ 1. With normalization, this produces the familiar breakdown

$X$ 2

which, in the Gaussian case, coincides precisely with the classical coefficient of determination $X$ 3. Here, the local $X$ 4 and the global $X$ 5 constructions are shown to be fundamentally parallel, both representing notions of explainability, but diverging in sensitivity: the former to local, potentially nonlinear behavior; the latter to global, linear structure.

Numerical Properties and Consistency

The paper explicitly verifies that the $X$ 6– $X$ 7 framework yields exact results in three canonical scenarios:

Functional Dependence: If $X$ 8 for measurable $X$ 9, the local residual exactly vanishes ( $\varepsilon$ 0, $\varepsilon$ 1).
Independence: If $\varepsilon$ 2 and $\varepsilon$ 3 are independent, the full dispersion of $\varepsilon$ 4 remains locally unexplained ( $\varepsilon$ 5 equals the marginal deviation, $\varepsilon$ 6).
Gaussian or Linear Dependence: For $\varepsilon$ 7 following a joint Gaussian law with correlation $\varepsilon$ 8, both the local $\varepsilon$ 9 coefficient $\delta$ 0 and the normalized $\delta$ 1 coefficient $\delta$ 2 equate to $\delta$ 3.

Thus, the constructions are compatible with classical probabilistic intuition and recover standard results under parametric assumptions.

Implications, Theoretical Insights, and Future Directions

The articulation of Chatterjee’s rank correlation via an $\delta$ 4– $\delta$ 5 analytic lens clarifies its nonparametric and local character. It offers a foundation for understanding why such statistics excel at detecting a wide spectrum of dependence structures: they operationalize local explainability rather than enforcing structural (e.g., linear) global models.

The parallel development of $\delta$ 6 (moment-based) analogues and identification of links to Pearson $\delta$ 7 suggest a roadmap for the systematic comparison of dependence measures: those based on robust local neighborhoods versus those rooted in parametric functional forms. The framework also naturally extends to investigating local explainability in higher dimensions and could motivate the design of novel statistics for multivariate or functional data.

Going forward, this structural perspective may drive theoretical developments in the characterization of dependence beyond monotonicity and linearity, as well as guide practical methodology for interpretable dependence analysis in high-dimensional learning problems.

Conclusion

By grounding Chatterjee’s rank correlation in a rigorous $\delta$ 8– $\delta$ 9 analysis, the study reveals its emergence as a local empirical residual, decoupled from marginal normalization and independent of explicit rank transformations. The approach unifies local, distribution-free statistics and traditional moment-based measures, situating both as limiting cases of generalized explainability. This perspective not only clarifies the theoretical basis for nonparametric dependence metrics, but also underscores the importance of locality in modern statistical and machine learning dependence analysis.