Percentile-Based Rating Transformation
- Percentile-based rating transformation is a quantitative method that converts raw scores into standardized percentiles for clearer comparisons.
- It utilizes empirical cumulative distribution functions and advanced algorithms like Hazen and P100 variants to mitigate outlier effects and normalize skewed data.
- The approach is crucial in bibliometrics, research evaluations, and recommender systems, providing robust tie handling and bias-free fractional scoring.
Percentile-based rating transformation is a quantitative technique for mapping raw scores—such as citation counts or user ratings—onto a standardized percentile scale (typically 0–100), enabling more interpretable and robust comparisons within and across skewed distributions. The approach linearizes heavy-tailed data, mitigates the influence of outliers, and supports cross-domain normalization, with extensive application in bibliometrics, research evaluation, and recommender systems.
1. Motivation and Conceptual Foundations
Percentile transformations are motivated by the inherent skewness in data such as citations, where most items are concentrated at low values and a minority realize very high counts (Schreiber, 2014). Traditional measures (mean, median) are confounded by this asymmetry and by susceptibility to outliers. By reexpressing raw values as percentiles—representing the percentage of the reference set at or below a score—one establishes an ordinal ranking less sensitive to distributional shape and more conducive to comparative analysis. Percentile thresholds (e.g., top-10%, top-1%) provide discrete performance classifications useful in policy and assessment contexts.
In recommender systems, raw user ratings are often biased by user-specific behaviors and scale usage. Percentile pre-processing flattens such biases, yielding distributions closer to uniform and empirically improving ranking metrics (Mansoury et al., 2019).
2. Mathematical Formalization and Core Algorithms
Given a reference set of size with scores , the empirical cumulative distribution function (CDF) is . The -th sample quantile is the smallest for which (Rousseau, 2011).
Percentiles are implemented by mapping each score to its corresponding quantile rank. Various formulas exist for percentile assignment and tie handling, including:
- Hazen (1914):
- "i–1" method:
- InCites (descending):
- Average-rank for ties: tied items receive the mean rank over their blocked positions
- Fractional scoring: interval-based partitioning where each item occupies ; overlap with percentile intervals is computed exactly (Schreiber, 2013, Waltman et al., 2012)
A general algorithmic procedure involves sorting the reference scores, computing the empirical CDF, partitioning into percentile classes, and aggregating fractional overlaps for both percentile and class score assignment.
3. Advanced Approaches: P100, P100′, P100″
Bornmann, Leydesdorff & Wang introduced the P100 scale, which ranks papers purely based on the set of unique citation counts: for distinct values , , with all papers at assigned . This delivers fixed boundary values (0 for min, 100 for max) and tie-unambiguous ranks (Bornmann et al., 2013).
The P100′ indicator augments P100 by incorporating frequency: for count at , the cumulative count is normalized by the total to span $0$–$100$: . P100′ closely tracks standard percentile methods (e.g., inverted InCites percentiles) except for edge cases with top-score ties (Schreiber, 2014).
To resolve paradoxes (e.g., jumps when tail ties change), P100" interpolates between lower and upper percentile bounds: , which always lies within the uncertainty interval, preserves extremal values, and eliminates discontinuities when top ties shift.
Comparison of approaches shows that P100 is conceptually elegant but empirically less predictive than Hazen or SCImago centile methods, which better maintain top-10% and top-1% performance stability across time (Bornmann et al., 2013).
4. Tie Handling and Fractional Attribution
Discrete data and frequent ties complicate percentile mapping. Fractional scoring, as formalized by Schreiber, divides each item’s quantile interval across classes at boundaries, guaranteeing aggregate scores that exactly reproduce theoretical expectations, essential for unbiased field and class comparisons. For item and class , compute the fractional weight:
and assign weighted class scores. This approach is robust to massive tie blocks, avoids lumpiness, and is computationally efficient for large datasets (Schreiber, 2013, Leydesdorff, 2012).
5. Extensions, Variants, and Practical Implementation
Beyond simple percentile ranks, advanced percentile-based transformations support finer analytical objectives:
- CP-IN/CP-EX: Bornmann & Williams define cumulative-inclusive (CP-IN) and cumulative-exclusive (CP-EX) percentages for a score :
- CP-EX yields minimum 0 for smallest scores, CP-IN yields maximum 100 for largest scores. Aggregation across multiple fields or units is handled via weighted means, facilitating complex field- or time-normalized evaluation pipelines (Bornmann et al., 2020).
- Percentile-based double-rank analysis: In bibliometrics, the double-rank methodology graphs the rank of a local unit’s publications versus global ranks, fitting the resulting distribution to a power law for predictive likelihood assessment at very low percentiles (breakthrough analyses) (Brito et al., 2017).
- Recommender systems: User-centric percentile transformation, such as where for tie strategies, improves distributional flatness and empirical ranking performance across algorithms. Smoothing procedures add artificial ratings to spread single-valued profiles (Mansoury et al., 2019).
6. Indicator Properties, Limitations, and Evaluation Guidance
Percentile rank scores (e.g., ) are strictly congruous indicators of relative performance: ordering between two sets is preserved when adding the same document to both (Rousseau, 2011). The I3 indicator is congruous for absolute performance, i.e., total impact. Bias-free indicators are realized through fractional scoring and interval-based assignment: when applied to an entire field, the expected score matches theory independent of underlying frequency distributions (Schreiber, 2013, Waltman et al., 2012).
Limitations arise from discrete jumps in small samples, which inflate percentile intervals and uncertainty; edge paradoxes in tail assignments for some approaches (e.g., P100′); and potential misalignment of means from intuitive expectations in non-uniform distributions. Analysts are advised to visually inspect distributions, report mean and dispersion of percentiles, and where relevant, utilize graphical summaries such as bar-graphs and beamplots to display percentiles over time or across units (Bornmann et al., 2020).
7. Applications, Predictability, and Domain Generalization
Percentile-based rating transformation is widely applied in scientific impact evaluation, institutional benchmarking, and recommender systems. In scientific impact forecasting, rank percentiles are highly predictable and stable for publications, with modest model improvement beyond simple autoregressive or linear predictors (Tian et al., 2021). These transformation schemes generalize to any domain requiring ordinal normalization—education test scores, gaming ELO ratings, sales figures—where segmenting reference sets and mapping performance into relative percentiles mitigate structural bias and facilitate comparative analyses.
Tables can summarize methodology characteristics:
| Approach | Tie Handling | Endpoints Fixed | Use Case |
|---|---|---|---|
| Hazen | Avg rank | No | General bibliometrics |
| P100 | Unique values | Yes | Fixed-scale ranking |
| P100′ | Frequencies | Yes | Robust percentiles |
| Fractional | Interval splits | Yes (aggregate) | Bias-free classification |
Percentile-based rating transformation is an indispensable tool for equitable and interpretable evaluation in domains characterized by non-normal, right-skewed distributions. Continued refinement of mathematical and computational schemes for percentile assignment, tie handling, and aggregation ensures robust, bias-minimized assessment across scientific and applied settings.