Papers
Topics
Authors
Recent
Search
2000 character limit reached

Adjusted Mutual Information (AMI)

Updated 24 January 2026
  • Adjusted Mutual Information (AMI) is a validation metric that quantifies clustering similarity by correcting mutual information for chance using permutation-based models.
  • It addresses computational challenges with approximations like PAMI and FastAMI, enabling scalable analysis on large or granular datasets.
  • AMI provides statistical significance through normalization and standardization, making it integral for evaluating clustering performance in diverse applications.

Adjusted Mutual Information (AMI) quantifies the similarity between two clusterings, correcting the raw mutual information between partitions for chance similarities under a permutation-based null model. AMI is ubiquitous as an external validation metric in clustering analysis due to its normalization and correction for cluster-size marginal effects. The metric is theoretically grounded in the permutation model for cluster assignments and is embedded within the broader family of information-theoretic and pair-counting indices. Computational challenges associated with the exact calculation of AMI on large or highly granular datasets have motivated the development of efficient approximations, notably the pairwise permutation AMI (PAMI) and Monte Carlo-based FastAMI.

1. Formal Definition and Permutation Null Model

Given partitions U={u1,,ur}U = \{u_1, \dots, u_r\} and V={v1,,vc}V = \{v_1, \dots, v_c\} of a set of NN objects, let nij=uivjn_{ij} = |u_i \cap v_j|, ai=jnija_i = \sum_j n_{ij}, and bj=inijb_j = \sum_i n_{ij}. The empirical marginal probabilities are P(i)=ai/NP(i) = a_i/N, Q(j)=bj/NQ(j) = b_j/N, and joint probabilities P(i,j)=nij/NP(i, j) = n_{ij}/N. The mutual information (MI) is

MI(U,V)=i=1rj=1cP(i,j)logP(i,j)P(i)Q(j)=H(U)+H(V)H(U,V)\mathrm{MI}(U, V) = \sum_{i=1}^r \sum_{j=1}^c P(i, j) \log \frac{P(i, j)}{P(i) Q(j)} = H(U) + H(V) - H(U,V)

where H(U)=i=1rP(i)logP(i)H(U) = -\sum_{i=1}^r P(i)\log P(i), H(V)=j=1cQ(j)logQ(j)H(V) = -\sum_{j=1}^c Q(j)\log Q(j), H(U,V)=i,jP(i,j)logP(i,j)H(U,V) = -\sum_{i,j} P(i,j)\log P(i,j).

AMI corrects the MI for chance by subtracting its expectation under the fixed-marginals permutation model, where cluster sizes ai,bja_i, b_j are preserved and item labels are permuted. The expectation E[MI(U,V)]\mathbb{E}[\mathrm{MI}(U, V)] is given by summing over hypergeometric distributions for each (i,j)(i, j) cell: P{nij=nai,bj,N}=(bjn)(Nbjain)(Nai)P\{n_{ij} = n \mid a_i, b_j, N\} = \frac{\binom{b_j}{n}\binom{N-b_j}{a_i-n}}{\binom{N}{a_i}} for nn between max(0,ai+bjN)\max(0, a_i + b_j - N) and min(ai,bj)\min(a_i, b_j). The expected MI is thus

E[MI(U,V)]=i=1rj=1cnnNlog(Nnaibj)P{nai,bj,N}\mathbb{E}[\mathrm{MI}(U, V)] = \sum_{i=1}^r \sum_{j=1}^c \sum_n \frac{n}{N} \log \left( \frac{N n}{a_i b_j} \right) P\{ n \mid a_i, b_j, N \}

The standard symmetric normalization of the maximum attainable MI is 12(H(U)+H(V))\frac{1}{2}(H(U) + H(V)). The AMI is finally: AMI(U,V)=MI(U,V)E[MI(U,V)]12(H(U)+H(V))E[MI(U,V)]\mathrm{AMI}(U,V) = \frac{ \mathrm{MI}(U,V) - \mathbb{E}[\mathrm{MI}(U,V)] }{ \frac{1}{2}(H(U) + H(V)) - \mathbb{E}[\mathrm{MI}(U,V)] } This normalization ensures AMI=1\mathrm{AMI} = 1 for identical clusterings and zero for independent clusterings under the null.

2. Exact Expectation, Variance, and Standardization

Under the permutation model, it is possible to compute both the exact expectation and variance of MI\mathrm{MI} analytically. The variance is required for the Standardized Mutual Information (SMI), defined as: SMI(U,V)=MI(U,V)E[MI(U,V)]Var(MI(U,V))\mathrm{SMI}(U, V) = \frac{ \mathrm{MI}(U,V) - \mathbb{E}[\mathrm{MI}(U,V)] }{ \sqrt{\mathrm{Var}(\mathrm{MI}(U, V))} } which quantifies statistical significance by expressing the deviation of observed MI in units of its null standard deviation. Small variance under the null implies that even moderate observed MI yields large SMI, indicating stable and statistically significant similarity.

The exact computation of E[MI(U,V)]\mathbb{E}[\mathrm{MI}(U,V)] and Var[MI(U,V)]\mathrm{Var}[\mathrm{MI}(U,V)] involves multi-level sums with hypergeometric probabilities, leading to computational costs O(RCN)\mathcal{O}(R C N) and O(RCN3)\mathcal{O}(R C N^3), respectively, for RR and CC clusters in UU and VV (Romano et al., 2015, Klede et al., 2023).

3. Generalizations: Tsallis Entropy and Relation to Other Indices

Romano et al. (Romano et al., 2015) unify AMI with pair-counting-based indices via the Tsallis qq-entropy framework. The Tsallis entropy of partition VV,

Hq(V)=1q1(1j(bj/N)q)H_q(V) = \frac{1}{q-1}\left( 1 - \sum_j (b_j / N)^q \right)

and its analogs for UU and the joint distribution interpolate between Shannon entropy (q1q \rightarrow 1, recovering Shannon-based AMI) and quadratic entropy (q=2q = 2, coinciding with Adjusted Rand Index). Within this framework,

AMIq(U,V)=i,jnijqi,jE[nijq]12(iaiq+jbjq)i,jE[nijq]\mathrm{AMI}_q(U,V) = \frac{ \sum_{i,j} n_{ij}^q - \sum_{i,j}\mathbb{E}[n_{ij}^q] }{ \frac{1}{2} \left( \sum_i a_i^q + \sum_j b_j^q \right) - \sum_{i,j}\mathbb{E}[n_{ij}^q] }

with AMI recovered at q1q \rightarrow 1 and ARI at q=2q=2. This formalism connects information-theoretic and pair-counting measures, making explicit the spectrum of biases and sensitivities entailed by qq.

4. Computationally Tractable Approximations: Pairwise-AM and FastAMI

The exact computation of AMI is computationally demanding for large or highly granular datasets. Lazarenko and Bonald (Lazarenko et al., 2021) introduce the pairwise permutation AMI (PAMI), which replaces full permutation averaging with pairwise swaps. Given UU and VV as before, for each swap only four contingency cells can change, leading to an explicit formula: $\begin{split} \mathrm{PAMI}(U,V) = &\ 2\sum_{i, j} \frac{n_{ij} (n - a_i - b_j + n_{ij})}{n^2} \left[ \frac{n_{ij}}{n} \log \frac{n_{ij}}{n} - \frac{n_{ij} - 1}{n} \log \frac{n_{ij}-1}{n} \right] \ &+ 2\sum_{i, j} \frac{(a_i-n_{ij})(b_j-n_{ij})}{n^2} \left[ \frac{n_{ij}+1}{n}\log\frac{n_{ij}+1}{n} - \frac{n_{ij}}{n}\log\frac{n_{ij}}{n} \right] \end{split}$ PAMI retains theoretical properties of AMI but reduces computational complexity to O(k)O(k \ell), enabling its application to large-scale clustering tasks. On both synthetic and real benchmarks, PAMI matches the AMI cluster ranking in 93%–98% of cases, with near-perfect Spearman correlations (0.96\approx 0.96) for most real datasets, and achieves an order-of-magnitude runtime improvement for large nn (Lazarenko et al., 2021).

FastAMI (Klede et al., 2023) further addresses scalability by employing Monte Carlo estimation for the expectation (and variance) under the permutation null. The method generates samples of cluster sizes and overlaps via fast sampling schemes—such as Walker's alias method and hypergeometric generators—incorporating an error-tunable stopping criterion: iiminandsi{pμi<1 pμiμi1i \geq i_{\min} \quad \text{and} \quad s_i \leq \begin{cases} p & \mu_i < 1 \ p\mu_i & \mu_i \geq 1 \end{cases} With O(RC/p2)\mathcal{O}(R C / p^2) samples required for relative precision pp, FastAMI provides unbiased, precision-adjustable AMI (and SMI) estimates at scales where exact and pairwise methods become computationally infeasible, achieving sub-millisecond or few-millisecond times for large datasets with median absolute errors below 10310^{-3} and perfect or near-perfect rank recovery (Klede et al., 2023).

5. Empirical Evaluation and Guidelines for Use

Empirical studies comparing exact AMI, PAMI, and FastAMI on synthetic chains, triplet-ordering tests, and a wide suite of real datasets establish several findings:

  • PAMI achieves >0.95>0.95 Spearman rank correlation (mean 0.96\approx0.96) with AMI and provides speed-ups of 10×10 \times or more on large datasets (n105n \gtrsim 10^5).
  • FastAMI achieves perfect rank correlation ($1.000$) with exact AMI at modest computational cost, outperforming PAMI in accuracy especially for highly granular partitions.
  • For SMI, direct contingency table Monte Carlo sampling yields >0.99>0.99 correlation where the exact variant times out or becomes computationally prohibitive.

Practical guidelines synthesized from Tsallis-qq analysis (Romano et al., 2015):

  • Shannon-based AMI (q1q \approx 1) favors pure clusters and is most appropriate when the reference clustering VV is unbalanced and contains small clusters requiring precise recovery.
  • ARI (q=2q = 2) is preferred for balanced clusterings with large, equal-sized clusters.
  • For model selection or situations sensitive to chance overlap (e.g., multiple candidate clusterings UU), SMI should be employed for uniform null selection probability.

6. Theoretical Properties and Limitations

AMI possesses symmetry, invariance to cluster label permutations, and achieves the normalization AMI=1\mathrm{AMI}=1 for perfectly matching clusterings and $0$ for baseline overlap under the null. However, the correction for bias is meaningful only under the fixed-marginals permutation null; interpretation under different clustering-generation mechanisms is not straightforward.

A notable limitation is the computational complexity bottleneck in the exact calculation, especially with unbalanced clusters or when R,C,NR, C, N are all large. Pairwise permutation adjustments and Monte Carlo approaches are thus essential for scalability but may entail small but nonzero divergence from exact AMI in rare or adversarial cases (PAMI can disagree with AMI in 2%–7% of triplet-ordering scenarios; FastAMI achieves arbitrarily small error at corresponding computational cost).

7. Implementation Considerations and Best Practices

Computing AMI and its approximations entails attention to clustering encoding, nontrivial cluster-size marginals, and null sampling. Notable implementation findings:

Method Complexity Empirical Accuracy
Exact AMI O(Nmax(R,C))O(N \max(R,C)) Baseline
PAMI O(RC)O(RC) Spearman >0.95>0.95 with AMI
FastAMI O(RC/p2)O(RC/p^2) Spearman $1.00$ with AMI at p=0.01p=0.01

Use cluster-size distributions drawn from uniform integer partitions, not uniform label assignments, to avoid empty cluster artifacts in random partition baselines. FastAMI’s parallelizable Monte Carlo samples and error-based stopping criteria enable robust large-scale use. Relative error criteria are recommended for MI estimates above unity; absolute error for low-MI regimes. Chan's two-pass variance update algorithm extends SMI estimation to distributed settings. For most AMI use cases, p=0.01p=0.01 delivers sufficient accuracy; for SMI, p=0.1p=0.1 restricts absolute errors to acceptable levels (Klede et al., 2023, Lazarenko et al., 2021, Romano et al., 2015).

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Adjusted Mutual Information (AMI).