I3: Comprehensive Impact Metrics
- The paper introduces I3 by summing article-level citation percentiles to overcome skewed citation distributions and combine quality with quantity.
- Comprehensive Impact Metrics I3 is a framework that normalizes citation data across fields, allowing direct benchmarking of journals, institutions, and countries.
- Its decomposable design supports detailed portfolio diagnostics and statistical testing, ensuring a robust, non-parametric evaluation of research impact.
Comprehensive Impact Metrics I3
Comprehensive impact metrics under the umbrella of "I3" (Integrated Impact Indicator) constitute a family of non-parametric, percentile-based indicators for journal, institutional, and country-level research evaluation. Developed to address the inadequacies of central-tendency statistics, I3 and its size-normalized variants systematically integrate the complete distributional shape of citations, robustly handle skewed data, and enable field-normalized, decomposition-compatible impact assessment at all levels of aggregation.
1. Motivation and Rationale
Citation distributions in scientific publishing are known to be highly right-skewed: a few papers receive the majority of citations while most receive few or none. Traditional mean-based metrics (e.g., Journal Impact Factor—JIF; citations per publication) do not accommodate such skewness, often penalize productivity as more publications with moderate citation counts lower averages, and are vulnerable to distortion by outlier articles. Percentile-based methods like I3 replace central tendency with summation over article-level percentile ranks, thereby honoring both the "mass" (number of publications) and the "2" (relative citation impact) of research outputs (Leydesdorff et al., 2011, Leydesdorff, 2011, Wagner et al., 2012).
I3 is designed to be decomposable: impact is first calculated at the level of the individual article and then aggregated, allowing direct comparison and benchmarking of journals, institutions, countries, or any arbitrary set of publications (Leydesdorff et al., 2011). Its construction makes no distributional assumptions, allowing straightforward non-parametric statistical testing and natural cross-field normalization.
2. Mathematical Formulation and Computational Workflow
The core I3 construct is defined as the sum of article-level citation percentile ranks within a well-defined reference set:
where is the number of papers in the set and is the article-level percentile (0–100), computed relative to an appropriate reference set (by field, year, document type). Discrete class-based variants, such as PR6 (the U.S. National Science Board's six-class scheme), or four-class log-weighted operationalizations (I3*), are used for improved interpretability and significance testing (Leydesdorff et al., 2018, Wagner et al., 2012).
Example: In the widely used I3* scheme (Leydesdorff et al., 2018, Dong et al., 5 Jan 2026):
- Top 1%: weight = 100
- Top 10%: weight = 10
- Top 50%: weight = 2
- Bottom 50%: weight = 1
Let be the count of articles in class , and its weight. Then
Size normalization is achieved by dividing by the number of articles:
Fractional citation counting is frequently applied: each received citation is weighted as the inverse of the number of references in the citing article, correcting for field-specific differences in citation behavior (Leydesdorff, 2012, Dong et al., 5 Jan 2026). Percentile thresholds are computed separately for each field-year-document type combination for rigorous normalization (Wagner et al., 2012, Bornmann et al., 2019).
3. Impact, Field Normalization, and Statistical Properties
I3 metrics offer several critical properties:
- Monotonic Integration: Every additional paper or citation increases I3; there is no penalty for productivity as is common in mean-based averages (Leydesdorff et al., 2011, Wagner et al., 2012).
- Field-Neutralization: Article-level percentile ranking within field-year strata enables robust cross-field comparison; a top-1% paper has the same I3 weight across biology and mathematics (Leydesdorff et al., 2018, Bornmann et al., 2019).
- Decomposability: Because percentiles are assigned to articles, impact can be flexibly aggregated by journal, institution, country, or author, and supports fine-grained portfolio diagnostics (Leydesdorff et al., 2011, Leydesdorff et al., 2018).
- Statistical Testability: Differences in I3 (or the proportions of top-X% papers) among units can be statistically assessed (e.g., by z-tests for independent proportions or chi-square for goodness-of-fit) (Wagner et al., 2012, Leydesdorff et al., 2018).
In comparative studies, I3 and its normalized forms correlate strongly with both publication volume and total citation count (for example, Spearman ρ = 0.92 for I3* vs. number of publications, and ρ = 0.816 for I3* vs. citations in a large sample of journals (Leydesdorff et al., 2018)), but diverge from the JIF by simultaneously reflecting size and quality.
4. Variants and Extensions
Multiple operationalizations exist under the I3 paradigm:
| Variant | Aggregation Level | Weighting/Class Strategy |
|---|---|---|
| I3 (continuous) | Any | Sum over all percentile ranks |
| I3/PR6 | Any | Six broad percentile bins |
| I3* (log-weighted 4-cl) | Any | Top 1%:100, 10%:10, 50%:2, 50%:1 |
| I3/N | Any | I3 normalized by set size |
The Scilit I3/N framework applies these constructs at scale to over 61,000 journals, using fractional citation counting and four-class percentile weights (100, 10, 2, 0), and has demonstrated improved coverage, fairness, and interpretability relative to JIF and CiteScore (Dong et al., 5 Jan 2026).
Field-normalized multivariate extensions, such as the academic trace , synthesize h-index and I3-class logic (core, tail, excess citations, uncited works) for multidimensional benchmarking (Ye et al., 2013, Xue et al., 2017). Empirical work supports I3's sensitivity to both research excellence (top 1%, 10%) and volume, and its ability to distinguish portfolio types in diagnostic plots (Leydesdorff et al., 2018, Dong et al., 5 Jan 2026).
5. Empirical Performance and Comparisons
Benchmarks show that I3 and I3/N robustly discriminate quality categories as validated against peer review scores (e.g., F1000Prime), performing comparably to or better than traditional field-normalized metrics such as mean-normalized citation score (MNCS), relative citation ratio (RCR), source-normalized citation scores, and proportions of top 10% or top 1% papers (Bornmann et al., 2019). PP_top 1% may slightly outperform I3/N for early citation lifecycle discrimination in fields where peer review focuses on high-end excellence, but I3/N best combines quantity and quality in broader evaluations.
I3 accurately reflects institutional or journal shifts that are not captured by JIF or h-index. For instance, journals with high productivity but moderate citation averages (e.g., PNAS) can outrank more selective but smaller outlets (e.g., Nature) in I3-based rankings (Leydesdorff et al., 2011, Wagner et al., 2012).
6. Limitations, Critiques, and Alternatives
Percentile-based systems such as I3 and I3* can exhibit limitations in small datasets due to ties and discontinuities; fractional scoring and alternative definitions (e.g., Hazen, Rousseau, Leydesdorff-Bornmann corrections) can mitigate, but not eliminate, these issues (Leydesdorff et al., 2011, Zhou et al., 2012). Critics have noted that I3 can overvalue the publication count ("mass") compared with raw citation-based indicators, and does not preserve precise citation-count differences (Zhou et al., 2012). The weighting scheme for percentile classes is a normative choice and introduces modest subjectivity (Leydesdorff et al., 2018, Dong et al., 5 Jan 2026).
Alternatives such as the Combined Impact Indicator (CII) aim to capture both citation and publication effects by blending raw counts and publication fractions, but in large datasets CII and I3 become highly correlated, differing mainly in philosophical emphasis (Zhou et al., 2012).
7. Future Directions and Applications
I3 metrics are deployed across journals (Scilit, WoS, Scopus), institutions, and research fields for collection management, editorial diagnostics, and benchmarking. Empirical studies advocate their use as a replacement or complement to JIF, especially in interdisciplinary or regional journal contexts with heterogeneous citation densities (Leydesdorff et al., 2018, Dong et al., 5 Jan 2026).
Further development focuses on:
- Refinement of class-weighting schemes for context-specific evaluation.
- Field boundary sensitivity analyses (journal-based vs. citation-based clustering).
- Advanced visual diagnostic tools (quadrant plots, Google Map overlays) (Dong et al., 5 Jan 2026, Leydesdorff, 2011).
- Combined multi-dimensional assessment with altmetrics, peer review, or narrative indicators.
- Transparent documentation of counting conventions and statistical significance assessments.
The I3 framework, and in particular its comprehensive, field-normalized implementations (e.g., I3/N), is now considered a methodological standard in bibliometrics for robust, additive, and policy-relevant impact evaluation.