Papers
Topics
Authors
Recent
Search
2000 character limit reached

Distance-Weighted Correlation Metrics

Updated 10 January 2026
  • Distance-weighted correlation metrics are dependence measures that integrate explicit metric weighting to capture nonlinear and heterogeneous relationships beyond traditional Pearson correlation.
  • They employ pairwise distance matrices, kernel-based transforms, and double-centering techniques to extend independence testing and clustering in non-Euclidean and graph-based data.
  • Applications include astrophysical classification, network analysis, and topological data analysis, offering robust tools for comparing complex structures in various domains.

A distance-weighted correlation metric is a class of dependence measures that incorporates explicit weighting by the underlying metric structure of the data, allowing the computation of correlations that reflect non-Euclidean geometry, non-uniform importance, and complex data domains. Such metrics include distance correlation, earth mover's correlation, distance-weighted Pearson correlations, and weighted rank-based metrics. These methods generalize classical correlation measures, such as Pearson's ρ\rho, to more flexible, rigorous frameworks that are sensitive to nonlinear, nonmonotonic, or heterogeneous dependencies. They find applications in independence testing, clustering, graphical model comparison, topological data analysis, and network science, among other areas.

1. Foundational Constructions: Distance Correlation and Covariance

The canonical distance correlation, due to Székely, Rizzo, and Bakirov, is formulated on the basis of the L2L_2-distance between the joint characteristic function and the product of marginals, weighted by a singular kernel. For random vectors XRpX \in \mathbb{R}^p, YRqY \in \mathbb{R}^q, define \begin{align*} \phi_{X,Y}(t,s) &= \mathbb{E}\left[ e{i \langle t, X \rangle + i \langle s, Y \rangle} \right], \ V2(X,Y) &= \frac{1}{c_p c_q} \int_{\mathbb{R}p \times \mathbb{R}q} | \phi_{X,Y}(s, t) - \phi_X(s)\phi_Y(t)|2 \, |s|{-(p+1)} |t|{-(q+1)} ds\,dt, \end{align*} with normalizing constants cp,cqc_p, c_q, and marginals ϕX,ϕY\phi_X, \phi_Y. The distance correlation is then

R(X,Y)=V(X,Y)V(X,X)V(Y,Y),0R(X,Y)1.R(X,Y) = \frac{V(X,Y)}{\sqrt{V(X,X) V(Y,Y)}}, \qquad 0 \leq R(X,Y) \leq 1.

A key property is that R(X,Y)=0R(X,Y) = 0 if and only if XX and YY are independent, in stark contrast to the Pearson coefficient, which vanishes under zero covariance, even for nonlinear relationships (Richards, 2017, Castro-Prado et al., 2020, Lyons, 2011).

An equivalent representation using pairwise distances is

L2L_20

where L2L_21 are independent copies.

These constructions admit population and sample analogues in both Euclidean and metric-space domains, providing a broad framework for dependence quantification with metric weights.

2. Extension to Metric Spaces and Negative Type

To accommodate data in general metric spaces, Lyons (Lyons, 2011, Castro-Prado et al., 2020), and subsequent works define the double-centered kernel

L2L_22

where L2L_23, L2L_24, for probability measure L2L_25.

The distance covariance in metric spaces is then

L2L_26

with sample versions obtained via doubly-centered distance matrices.

A crucial requirement is that the metric spaces must be of strong negative type. A space L2L_27 has negative type if, for any finite signed measure L2L_28 with L2L_29,

XRpX \in \mathbb{R}^p0

Strong negative type further demands that XRpX \in \mathbb{R}^p1. This property ensures that distance covariance vanishes only under independence (Lyons, 2011, Castro-Prado et al., 2020). Euclidean spaces, separable Hilbert spaces, and XRpX \in \mathbb{R}^p2 spaces with XRpX \in \mathbb{R}^p3 all possess strong negative type.

3. Weighted, Graph-Based, and Non-Euclidean Correlation Metrics

Beyond canonical distance correlation, recent research addresses weighted and graph-based variants tailored to specialized data structures:

Distance-weighted Pearson correlation on networks (Coscia et al., 2024): Let XRpX \in \mathbb{R}^p4 be node attributes on a graph XRpX \in \mathbb{R}^p5, with edge-dependent distances XRpX \in \mathbb{R}^p6 and a kernel XRpX \in \mathbb{R}^p7 (typically XRpX \in \mathbb{R}^p8). Form weights XRpX \in \mathbb{R}^p9. The distance-weighted Pearson correlation is

YRqY \in \mathbb{R}^q0

with centered vectors YRqY \in \mathbb{R}^q1. Negative-type of YRqY \in \mathbb{R}^q2 is necessary for well-defined correlation: only then does YRqY \in \mathbb{R}^q3 yield positive-definite quadratic forms and correlations bounded in YRqY \in \mathbb{R}^q4.

Earth Mover’s Correlation (EMC) (Móri et al., 2020): For YRqY \in \mathbb{R}^q5 metric spaces and random variables YRqY \in \mathbb{R}^q6, let YRqY \in \mathbb{R}^q7 be the first-order Wasserstein (earth mover) distance. EMC defines a nonparametric correlation via

YRqY \in \mathbb{R}^q8

and

YRqY \in \mathbb{R}^q9

with cp,cqc_p, c_q0. EMC is applicable to arbitrary metric spaces, requiring only first moment finiteness. For independence, cp,cqc_p, c_q1; for perfect dependence, cp,cqc_p, c_q2 (axiomatically in Banach spaces).

Weighted Kendall's Tau and Rank Distances (Piek et al., 2024): Weighted generalizations of Kendall’s tau handle positional importance in rankings. For cp,cqc_p, c_q3, with weights cp,cqc_p, c_q4, the weighted tau distance is

cp,cqc_p, c_q5

forming a genuine metric under positive weights. These metrics are relevant for correlation analysis in rank aggregation and preference modeling.

4. Metric-Preserving Transformations of Similarity Measures

Metric distances derived from similarities (e.g., cosine, Pearson, Spearman) are synthesized through metric-preserving functions (Dongen et al., 2012, Solo, 2019):

  • Let cp,cqc_p, c_q6 be a similarity. Choose cp,cqc_p, c_q7 increasing and concave such that cp,cqc_p, c_q8:
    • cp,cqc_p, c_q9 (angular distance),
    • ϕX,ϕY\phi_X, \phi_Y0,
    • ϕX,ϕY\phi_X, \phi_Y1,
    • ϕX,ϕY\phi_X, \phi_Y2 (absolute-correlation distance).

Negative-type metrics, such as those arising on trees, resistance distances in graphs, or from suitably constructed kernels, ensure metric validity—preserving triangle inequality and identity of indiscernibles (Coscia et al., 2024).

5. Computational Techniques and Complexity

Empirical evaluation of distance-weighted correlation metrics is generally quadratic in sample size:

  • Compute pairwise distances to form matrices ϕX,ϕY\phi_X, \phi_Y3 (data) and ϕX,ϕY\phi_X, \phi_Y4 (associated metric/weight).
  • Double-center both matrices: ϕX,ϕY\phi_X, \phi_Y5.
  • Distance covariance: ϕX,ϕY\phi_X, \phi_Y6.
  • Distance correlation: ϕX,ϕY\phi_X, \phi_Y7 (Richards, 2017, Chaudhuri et al., 2018).

Fast algorithms exist for univariate cases, achieving ϕX,ϕY\phi_X, \phi_Y8 complexity via sorting and cumulative sums, making the methods feasible for large-scale applications (Chaudhuri et al., 2018).

6. Practical Applications and Domain-Specific Metrics

Distance-weighted correlation metrics are pivotal in domains where classical linear correlations are insufficient:

  • Astrophysical classification: Nonlinear associations in high-dimensional galaxy surveys are revealed only by distance correlation, further outperforming Pearson's ϕX,ϕY\phi_X, \phi_Y9 in discriminating types (Richards, 2017).
  • Network analysis: Distance-weighted Pearson and resistance-based metrics provide well-behaved correlation measures on graphs, critical in gene expression, brain connectivity, and cyber-security clustering (Coscia et al., 2024, Solo, 2019).
  • Topological data analysis: Distance correlation enables direct comparison of topological summaries (e.g., persistence diagrams, landscapes) residing in distinct metric spaces, supporting independence testing and parameter association (Turner et al., 2019).
  • Graphical model comparison: Distance-weighted metrics, e.g., uncertainty-normalized Hellinger affinity (Wang et al., 2017), quantify similarity across learned graphical models with proper uncertainty adjustment.

7. Theoretical Guarantees, Limitations, and Extensions

Distance-weighted correlation metrics possess well-established properties:

  • Characterize independence exactly in strong negative-type spaces (Lyons, 2011, Castro-Prado et al., 2020).
  • Are scale- and location-invariant and sensitive to non-linear, non-monotonic dependence (Richards, 2017).
  • Require only first-moment finiteness (Brownian variants: second-moment).
  • Allow permutation-based null inference and bootstrap resampling for nonparametric testing.

Limitations include the need for negative-type metrics, computational burden for R(X,Y)=V(X,Y)V(X,X)V(Y,Y),0R(X,Y)1.R(X,Y) = \frac{V(X,Y)}{\sqrt{V(X,X) V(Y,Y)}}, \qquad 0 \leq R(X,Y) \leq 1.0 evaluations (alleviated by fast algorithms in special cases), and careful selection of metric-preserving transforms to avoid loss of discriminatory power.

Extensions include kernelized independence criteria (HSIC), fractional/fractionalized covariances for heavy-tailed settings, and multiway generalizations for higher-order association structures.


References:

  • "Distance Correlation: A New Tool for Detecting Association and Measuring Correlation Between Data Sets" (Richards, 2017)
  • "Distance covariance in metric spaces" (Lyons, 2011)
  • "Nonparametric independence tests in metric spaces: What is known and what is not" (Castro-Prado et al., 2020)
  • "Pearson Distance is not a Distance" (Solo, 2019)
  • "Metric distances derived from cosine similarity and Pearson and Spearman correlations" (Dongen et al., 2012)
  • "Pearson Correlations on Networks: Corrigendum" (Coscia et al., 2024)
  • "The Earth Mover's Correlation" (Móri et al., 2020)
  • "On a weighted generalization of Kendall's tau distance" (Piek et al., 2024)
  • "Correlation between Multivariate Datasets, from Inter-Graph Distance computed using Graphical Models Learnt With Uncertainties" (Wang et al., 2017)
  • "A fast algorithm for computing distance correlation" (Chaudhuri et al., 2018)
  • "Same But Different: Distance Correlations Between Topological Summaries" (Turner et al., 2019)
  • "Detection of Periodicity Based on Independence Tests - III. Phase Distance Correlation Periodogram" (Zucker, 2017)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Distance-Weighted Correlation Metric.