Papers
Topics
Authors
Recent
Search
2000 character limit reached

Multi-Class Informedness & Markedness Metrics

Updated 21 December 2025
  • Multi-Class Informedness and Markedness are evaluation metrics that generalize binary measures to robustly assess classifier performance in imbalanced multi-class settings.
  • They compute per-class informedness (TPR minus FPR) and markedness (Precision plus InversePrecision minus 1) by decomposing the confusion matrix with micro- and macro-averaging.
  • Their design ensures symmetric treatment of all classes and offers a chance-corrected, probabilistic interpretation essential for reliable model evaluation.

Multi-class Informedness (ΔP′) and Markedness (ΔP) are evaluation metrics designed to generalize their binary counterparts to the multi-class classification setting. Originating from the need for metrics that are unbiased with respect to base rates and label distributions, these measures quantify the probability that a classifier’s prediction is informed or marked versus chance, overcoming the known biases in accuracy, Recall, Precision, and F-measure when applied naïvely to imbalanced or multi-class tasks. Informedness assesses the probability that the prediction rule is informed about the true label; Markedness assesses the probability that the ground-truth class is marked by the classifier’s output. Both range in [−1,+1][-1, +1], where 0 indicates classification performance at chance level, positive values imply performance better than chance, and negative values indicate performance systematically worse than chance. These measures treat all KK classes symmetrically and correct for both prevalence (true class distribution) and prediction bias (label distribution) (Powers, 2020).

1. Formal Definition and Multi-class Generalization

Let CC be the K×KK \times K confusion matrix, with CijC_{ij} denoting the count of true class ii predicted as class jj, and N=∑i=1K∑j=1KCijN = \sum_{i=1}^K \sum_{j=1}^K C_{ij}. For each class ii:

  • ri=∑jCijr_i = \sum_j C_{ij}, the number of true-KK0 instances
  • KK1, the number of predicted-KK2 instances
  • KK3, KK4, KK5, KK6

Define:

  • KK7 (RecallKK8) = KK9
  • CC0
  • CC1 (PrecisionCC2) = CC3
  • CC4

Then:

  • CC5
  • CC6

Aggregate measures:

  • Micro-average (prevalence-weighted): CC7, CC8
  • Macro-average (uniform-weighted): CC9, K×KK \times K0

Prevalence-weighting is advocated for K×KK \times K1 and bias-weighting for K×KK \times K2 to ensure probabilistic interpretation (Powers, 2020).

2. Derivation and Intuitive Interpretation

In the binary setting, Informedness reduces to K×KK \times K3 and Markedness to K×KK \times K4. For the multi-class case, each class K×KK \times K5 is dichotomized (one-vs-rest); InformednessK×KK \times K6 and MarkednessK×KK \times K7 are computed using the binary formulas. The aggregate reflects either the expected per-example (micro) or per-class (macro) performance.

  • Informedness captures the probability a classifier’s guess is informed beyond chance, correcting for class imbalance and chance-level guesses.
  • Markedness reflects the probability that the true label is marked by the prediction, correcting for labeling bias.

Zero corresponds to chance-level behavior; positive or negative values indicate better or worse than chance, respectively. This property is not shared by measures such as raw Recall or Precision, which can produce misleading values under class or label imbalance.

3. Relationships with Other Metrics

In the binary scenario:

  • Matthews Correlation/Pearson’s K×KK \times K8: K×KK \times K9
  • Area under the ROC curve (AUC): CijC_{ij}0
  • F1 Score: CijC_{ij}1, which can be related to CijC_{ij}2, bias, and prevalence.

For the multi-class case, a correlation measure can be defined as CijC_{ij}3. Other contingency-matrix based measures (e.g., Cramer’s CijC_{ij}4) are also possible. However, CijC_{ij}5 and CijC_{ij}6 are sufficient to provide a two-dimensional summary of classification behavior (Powers, 2020).

4. Worked Example

Given a CijC_{ij}7 confusion matrix CijC_{ij}8:

CijC_{ij}9

With ii0, row-sums ii1, column-sums ii2, the following per-class Informedness and Markedness are obtained:

Class (ii3) Informednessii4 Markednessii5
1 0.661 0.631
2 0.55 0.542
3 0.615 0.657

Micro-averages:

  • ii6
  • ii7

Macro-averages:

  • ii8
  • ii9

These values quantify, on the jj0 scale, how much better the classifier performs compared to chance, accounting for both row and column distributions (Powers, 2020).

5. Algorithmic Computation

Efficient computation of multi-class Informedness and Markedness is based directly on the confusion matrix. The routine operates in jj1 time for jj2 classes:

jj7

This algorithm operates by class, dichotomizing each class to compute per-class Informedness and Markedness, and then aggregating using either micro- or macro-averaging.

6. Interpretation, Implications, and Use Cases

jj3 provides the expected informedness for a random true sample, while jj4 gives the expected markedness for a random predicted sample. Both measures are interpretable on the jj5 interval, enabling meaningful direct comparison across datasets of varying imbalance.

These statistics correct for chance-level behavior induced by class imbalance and prediction bias, treating all jj6 classes symmetrically. Macro-averaging may be preferred when each class is to be weighted equally, regardless of size. Both decompositions are useful for performance auditing and class-level error analysis in multi-class tasks.

Their closed-form relationship to ROC-AUC, F-measure, and correlation in the dichotomous case extends analytical insights, while their multi-class generalization preserves probabilistic interpretation—a distinguishing property among multi-class evaluation measures (Powers, 2020).

7. Context within Broader Evaluation Frameworks

Multi-class Informedness and Markedness are designed to address the limitation that standard metrics such as accuracy, Recall, Precision, and F-measure can be misleading under class or label imbalance, often inflating apparent performance in uninformative models. Unlike these conventional statistics, Informedness and Markedness have zero-value baselines at random guessing and robustly penalize both Type I and Type II errors, regardless of class prevalence.

A plausible implication is that their adoption may improve the robustness of model selection protocols and fairness in quantitative model benchmarking across datasets with heterogeneous class distributions. Their explicit correction for base-rate effects makes them particularly suitable for research requiring reliable inter-dataset or inter-model comparisons, especially in multiclass settings common in real-world machine learning applications.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Multi-Class Informedness and Markedness.