Fine-Grained Top-K PR Curve

Updated 4 February 2026

Fine-Grained Top-K precision-recall curve is a metric that evaluates classifier performance at every rank K by measuring precision and recall on top-ranked predictions.
It leverages posterior probability thresholding to select high-confidence outputs, ensuring optimal precision and recall in imbalanced and multiclass settings.
The approach, including efficient computation and partial AUTKC, provides detailed insights into trade-offs in real-world applications like medical imaging and information retrieval.

A fine-grained Top-K precision-recall curve provides a high-resolution performance profile for classifiers by evaluating precision and recall restricted to the top-ranked predictions—according to probability, certainty, or confidence—across all possible values of $K$ (number of positive predictions retained). This curve is central to model assessment when only the highest-confidence outputs are of interest, such as in information retrieval, imbalanced classification problems, or large-scale multiclass contexts. Its construction, optimality properties, and utilization as a learning objective have been formalized across recent literature (Tasche, 2018, &&&1&&&, Wang et al., 2022).

1. Formal Definition and Parametrizations

Let $(X,Y)$ be a random pair on $\mathbb{R}^d \times \{0,1\}$ with joint distribution $P$ , and define $\eta(x) = P(Y=1 \mid X = x)$ as the posterior positive-class probability. The fine-grained Top-K precision-recall curve is constructed by sorting instances by $\eta(x_i)$ (or any certainty score $s_i \in [0,1]$ ) in decreasing order and, for each integer $k = 1, \dots, n$ , evaluating empirical precision and recall: $\widehat{\mathrm{P@K}}(k) = \frac{ \#\{i \leq k: y_{(i)} = 1\} }{ k }, \qquad \widehat{\mathrm{R@K}}(k) = \frac{ \#\{i \leq k: y_{(i)} = 1\} }{ \#\{i : y_i = 1\} }$ where $y_{(i)}$ is the label of the instance ranked $i$ th highest by $\eta(x)$ or $s_i$ (Tasche, 2018, Fischer et al., 2023).

Alternatively, one may parametrize by the acceptance rate $\alpha = K/n$ (or reject fraction $\rho = 1 - K/n$ ), or by the threshold $t$ on $\eta(x)$ :

$R(\alpha) = \frac{P(\eta(X) \geq t_\alpha,\,Y=1)}{P(Y=1)}$
$P(\alpha) = \frac{P(\eta(X) \geq t_\alpha,\,Y=1)}{\alpha}$

A continuous precision-recall curve can be achieved by linear interpolation between adjacent $(\text{Recall@}K, \text{Precision@}K)$ points (Tasche, 2018, Wang et al., 2022).

2. Theoretical Optimality of Posterior Thresholding

Precision@K and Recall@K are maximized by thresholding the posterior probability at the appropriate quantile. For a fixed acceptance rate $\alpha = K/n$ , the optimal threshold $t^*$ is the $(1-\alpha)$ –quantile of $\eta(X)$ : $t^* = \inf \{ t : P[\eta(X) \geq t] \leq \alpha \}$ The classifier $h_{t^*}(x) = \mathbf{1}_{\{\eta(x) \geq t^*\}}$ achieves

$h_{t^*} = \arg\max_{h: \, P(h(X) = 1) = \alpha} \mathrm{Precision}(h) = \arg\max_{h: \, P(h(X) = 1) = \alpha} \mathrm{Recall}(h)$

This optimality holds in both population and empirical regimes, contingent on the continuity of the distribution of $\eta(X)$ (Tasche, 2018).

In the multiclass setting with a score vector $f(x) \in \mathbb{R}^C$ , let $\pi_f(x)(y)$ be the rank of the true class $y$ . For each $k = 1, \dots, C$ , define Top- $k$ accuracy: $\mathrm{acc}_k(x, y; f) = \mathbf{1}\{ \pi_f(x)(y) \le k \}, \qquad \mathrm{precision}(k) = \mathrm{acc}_k / k, \qquad \mathrm{recall}(k) = \mathrm{acc}_k$ The Bayes-optimal scoring function for partial Area Under the Top-K Curve (AUTKC) must place the top $K$ classes (by $\eta(x)$ ) strictly above all others; this eliminates the possibility of irrelevant labels attaining high ranks (Wang et al., 2022).

3. Construction Algorithms and Complexity

The canonical algorithm for constructing the fine-grained Top-K precision-recall curve operates as follows (Fischer et al., 2023):

Sort test instances by (descending) certainty score or posterior estimate— $O(N \log N)$ .
Sequentially for $K = 1, \ldots, N$ $K = 1, \dots, N$ :
- Compute $\mathrm{TP}(K) = \sum_{i=1}^K \mathbf{1}\{y_{(i)}=1\}$
- Compute $\mathrm{Precision@}K = \mathrm{TP}(K)/K$
- Compute $\mathrm{Recall@}K = \mathrm{TP}(K)/P$ where $P = \sum_{i=1}^N \mathbf{1}\{y_i=1\}$
Collect or plot $(\mathrm{Recall@}K,\, \mathrm{Precision@}K)$ as a fine-grained, piecewise-constant curve.

For intermediate $K$ (non-integer reject rates), linear interpolation between the nearest $K$ is standard. Smoothing by binning or interpolation across pseudo-thresholds is also feasible, but the essence of fine granularity is preserved by evaluating at all values of $K$ (Fischer et al., 2023, Tasche, 2018).

4. Partial Area Under the Top-K Curve (AUTKC)

The partial AUTKC operationalizes the fine-grained Top-K precision-recall curve as a scalar metric. For $K \in [1, C]$ in multiclass, define

$\mathrm{AUTKC}_K^\downarrow(f) = \mathbb{E}_{(x, y)} \left[ \frac{1}{K} \sum_{k=1}^K \mathbf{1} \{\pi_f(x)(y) > k\} \right]$

or, in accuracy form,

$\mathrm{AUTKC}_K^\uparrow(f) = 1 - \mathrm{AUTKC}_K^\downarrow(f) = \mathbb{E}_{(x, y)} \left[ \frac{1}{K} \sum_{k=1}^K \mathbf{1} \{\pi_f(x)(y) \leq k\} \right]$

This metric strictly aggregates performance across the top $K$ , yielding discriminating information compared to single fixed- $K$ measures. The partial AUTKC is strictly finer than fixed $k$ Top- $k$ error, and models optimized for partial AUTKC provide superior trade-offs across all cut-offs (Wang et al., 2022).

The surrogate-risk minimization framework for AUTKC replaces the indicator with any smooth, strictly decreasing loss (e.g., logistic, exponential, squared) to ensure Fisher consistency for the Bayes-optimal solution, unlike hinge surrogate losses (Wang et al., 2022).

5. Practical Use Cases and Empirical Observations

Fine-grained Top-K precision-recall curves are particularly valuable in:

Domains with severe class imbalance, where accuracy metrics can be misleading. Precision-reject and recall-reject curves provide clear insight into the trade-off as low-confidence instances are withheld (Fischer et al., 2023).
Medical settings (e.g., tumor classification), where PRC/RRC accurately reflect trade-offs between type I and type II errors under selective instance acceptance.
Large-scale multiclass benchmarks, where semantic ambiguity makes ranking-oriented metrics (Top- $k$ curves or AUTKC) more appropriate than conventional PR-AUC (Wang et al., 2022).

Empirically, in prototype-based classifiers using ground-truth Bayes scores, PRC and RRC can closely match Bayes-optimal curves for high acceptance rates ( $\alpha \gtrsim 0.8$ ). In class-imbalanced and real-world data, PRC/RRC expose non-monotonicities and realistic drop-offs in performance that are obscured by accuracy-based reject curves. The recommendation is to always assess PRC/RRC for imbalanced data and to select the acceptance (or rejection rate) to control the relevant type of error (Fischer et al., 2023).

6. Implementation Considerations and Limitations

Resolution: The curve's granularity is dictated by sample size ( $K = 1,\ldots,N$ or class count $K = 1,\ldots,C$ ). For continuous or interpolated thresholding, piecewise interpolation yields visually smooth curves but does not alter core statistics.
Assumption: For theoretical uniqueness and optimality, one often requires the distribution of the scoring function (e.g., $\eta(X)$ ) to be continuous; ties may necessitate randomization on flat regions (Tasche, 2018).
Statistical Guarantees: Generalization bounds for partial AUTKC under Lipschitz-continuous surrogates are insensitive to the number of classes if $K \sim C$ and the model class is regularized (e.g., spectral-norm constraint in deep networks) (Wang et al., 2022).
"Train once, threshold many times": Once posteriors or certainty scores are estimated, recomputation for all $K$ avoids retraining or repeated model evaluation (Tasche, 2018).
Applicability: The methodology generalizes beyond precision and recall to any confusion-matrix-based measure at fixed positive rate, such as $F_\beta$ at the top fraction (Tasche, 2018).

7. Relation to Other Metrics and Conceptual Distinctions

The fine-grained Top-K precision-recall curve is distinct from the standard PR-AUC in that it assesses ranking with respect to binary or multiclass classification at varying acceptances, rather than computing global confusion rates.
In Top- $k$ evaluation, each instance is associated with a single relevant item (for multiclass), so recall points are $\in \{0,1\}$ . This yields a stepwise curve that is naturally more granular and instance-specific than classical PR curves commonly used in information retrieval with multiple possible positives per query (Wang et al., 2022).
AUTKC complements single Top- $k$ accuracy metrics by aggregating across $1\leq k\leq K$ , thus mitigating the risk of optimizing away from true ranking fidelity.
Reject-curves (PRC/RRC) formalize the trade-off between coverage and performance, directly corresponding to Top- $K$ curves with $\rho = 1 - K/N$ (Fischer et al., 2023).

The fine-grained Top-K precision-recall curve and its associated metrics (including partial AUTKC) have become essential for rigorous model evaluation and optimization in scenarios where only the highest-confidence predictions are actionable. Their foundations in posterior thresholding, statistical optimality, and flexible computation enable both fine-scale analysis and principled algorithmic design (Tasche, 2018, Fischer et al., 2023, Wang et al., 2022).