A Weighted Correlation Index for Rankings with Ties

Published 12 Apr 2014 in cs.SI and cs.IR | (1404.3325v3)

Abstract: Understanding the correlation between two different scores for the same set of items is a common problem in information retrieval, and the most commonly used statistics that quantifies this correlation is Kendall's $\tau$. However, the standard definition fails to capture that discordances between items with high rank are more important than those between items with low rank. Recently, a new measure of correlation based on average precision has been proposed to solve this problem, but like many alternative proposals in the literature it assumes that there are no ties in the scores. This is a major deficiency in a number of contexts, and in particular while comparing centrality scores on large graphs, as the obvious baseline, indegree, has a very large number of ties in web and social graphs. We propose to extend Kendall's definition in a natural way to take into account weights in the presence of ties. We prove a number of interesting mathematical properties of our generalization and describe an $O(n\log n)$ algorithm for its computation. We also validate the usefulness of our weighted measure of correlation using experimental data.

Abstract PDF Upgrade to Chat

Citations (89)

View on Semantic Scholar

Summary

The paper introduces a weighted extension of Kendall’s τ, effectively addressing the limitations of traditional measures in datasets with many ties.
It presents a mathematical framework and adapts Knight’s algorithm to compute ranking discordances in O(n log n) time.
Empirical results on social networks and web graphs demonstrate that hyperbolic weightings better capture ranking similarities among top items.

A Weighted Correlation Index for Rankings with Ties

Introduction

The paper "A Weighted Correlation Index for Rankings with Ties" focuses on addressing a common problem in graph analysis and information retrieval: understanding the correlation between different rankings of the same set of items, particularly when ties in the rankings are present. The widely used Kendall's $\tau$ statistic fails to adequately account for the significance of discordances among highly ranked items compared to those with lower ranks. The proposed method extends Kendall's $\tau$ to consider weights in the presence of ties, offering a refined measure that takes into account the relative importance of items, especially in large networks where ties are prevalent.

Mathematical Framework

The paper bases its development on a mathematical reformulation of Kendall’s $\tau$ , designed to incorporate weights for rankings. This is achieved by defining a weighted inner product:

$\bm{r}, \bm{s} := \sum_{i<j}sgn(r_i-r_j)sgn(s_i-s_j)w(i,j),$

where $w(i,j)$ is a weight function applied to the pairs of items. This approach allows for a norm to be defined on rankings, measuring their "untieness," supporting the computation of an extended correlation index. The index maintains properties akin to traditional statistics such as boundedness and symmetry, ensuring its validity as a correlation measure.

Algorithmic Implementation

To compute the proposed weighted correlation index efficiently, the authors adapt Knight’s algorithm to weigh exchanges between items in rankings. The algorithm operates in $O(n \log n)$ time, leveraging sorting mechanisms and facilitating large-scale applications. The adaptation involves tracking a residual weight that accumulates additively or multiplicatively based on predefined weighting schemes, allowing for dynamic calculation of discordances and concordances.

Practical Application and Performance

The study validates its theoretical proposals through empirical evaluation on data from social networks and web graphs. The results demonstrate that the new weighted $\tau$ , particularly with hyperbolic weighting, aligns with intuitive judgments about ranking similarity—highlighting strong correlations where expected and allowing discrepancies in cases of insignificant ties. This new metric surpasses unweighted statistics like traditional Kendall's $\tau$ in reflecting the substantial agreements among top-ranked items, particularly in datasets with pervasive ties.

The proposed approach extends previous attempts to incorporate weighting into rankings, such as those by Shieh and others, by offering a more comprehensive treatment that incorporates ties naturally and providing a scalable algorithm to compute such measures. The paper’s experiments confirm that logarithmic weightings align too closely with standard measures, while quadratic weightings can be overly insensitive to important fluctuations.

Conclusion

By introducing a weighted variant of Kendall's $\tau$ that accounts for ties, this research provides an enhanced statistical tool that effectively captures the correlation of scores and ranks in complex datasets. This is particularly valuable in analyzing centrality scores in large graphs, where traditional assumptions of tie-free rankings are impractical. Ongoing work may extend these concepts to compare partial rankings and further refine the behavior of weighted correlation indices under varying conditions. The algorithm is robust enough for integration with parallel and distributed computing paradigms, ensuring its applicability to real-world data at scale.