Papers
Topics
Authors
Recent
Search
2000 character limit reached

Ewens–Pitman Partitions

Updated 30 November 2025
  • Ewens–Pitman partitions are a two-parameter family of exchangeable random partitions that generalize the classical Ewens sampling formula and the Pitman–Yor process.
  • They are constructed via predictive models like the Chinese Restaurant Process and stick-breaking representations, providing clear probabilistic and combinatorial insights.
  • These partitions underpin practical applications in population genetics, Bayesian nonparametrics, and machine learning through their well-defined asymptotic and large deviation behaviors.

Ewens–Pitman partitions constitute a two-parameter family of exchangeable random partitions over [n]={1,,n}[n] = \{1,\dots, n\}, determined by (α,θ)(\alpha,\theta) with either 0α<10 \leq \alpha < 1 and θ>α\theta > -\alpha, or α<0\alpha < 0 and θ=mα\theta = -m\alpha for mNm \in \mathbb{N}. They interpolate between the classical Ewens sampling formula (α=0\alpha=0) and the two-parameter Poisson–Dirichlet (Pitman–Yor) distribution (0<α<10 < \alpha < 1), and admit deep connections with Gibbs partitions, stable subordinators, generalized Stirling numbers, compound Poisson representations, and the combinatorics of symmetric groups. Rich probabilistic, asymptotic, and algebraic structures underlie these partitions, yielding both practical statistical tools and theoretical insight into fragmentation, random trees, and Bayesian nonparametrics.

1. Formal Definition and Exchangeable Partition Probability Function

A random partition of [n][n] with block sizes n1,,nkn_1, \ldots, n_k (j=1knj=n\sum_{j=1}^k n_j = n) is assigned the Ewens–Pitman probability

pα,θ(n1,,nk)=i=1k1(θ+iα)(θ+1)n1j=1k(1α)nj1p_{\alpha,\theta}(n_1, \ldots, n_k) = \frac{\prod_{i=1}^{k-1}(\theta + i\alpha)}{(\theta+1)_{n-1}} \prod_{j=1}^k (1-\alpha)_{n_j-1}

where (a)m=a(a+1)(a+m1)(a)_m = a(a+1)\dots(a+m-1) is the Pochhammer symbol, and (1α)nj1(1-\alpha)_{n_j-1} encodes multiplicative block-size weights (Greve, 6 Mar 2025, Dolera et al., 2021).

In combinatorial terms, the total probability of an unordered set partition with sizes (n1,...,nk)(n_1, ..., n_k) is

Mn(n1,,nk)=(nn1,,nk)  pα,θ(n1,,nk)M_n(n_1,\ldots,n_k) = \binom{n}{n_1,\ldots,n_k} \; p_{\alpha,\theta}(n_1, \ldots, n_k)

The parameters must satisfy 0α<10 \leq \alpha < 1, θ>α\theta > -\alpha or α<0\alpha < 0, θ=mα\theta = -m\alpha.

This generalizes the Ewens sampling formula (α=0\alpha=0), for which

p0,θ(n1,,nk)=θk(θ+1)n1j=1k(nj1)!p_{0,\theta}(n_1,\dots,n_k) = \frac{\theta^k}{(\theta+1)_{n-1}} \prod_{j=1}^k (n_j-1)!

and the Poisson–Dirichlet process for 0<α<10 < \alpha < 1.

2. Probabilistic Constructions and Predictive Structure

Ewens–Pitman partitions are equivalently described via the Chinese Restaurant Process (CRP). Given a partial partition of [n][n] into KnK_n blocks with sizes n1,...,nKnn_1, ..., n_{K_n}, the (n+1)-st element joins block jj with probability

njαθ+n\frac{n_j - \alpha}{\theta + n}

or initiates a new block with probability

θ+αKnθ+n\frac{\theta + \alpha K_n}{\theta + n}

This sequential construction yields exchangeable distributions on partitions and underpins their representation as de Finetti mixtures over random discrete measures, notably the Pitman–Yor process (Greve, 6 Mar 2025, Dolera et al., 2021, Favaro et al., 2014).

In the stick-breaking representation, for α[0,1)\alpha \in [0,1), θ>α\theta > -\alpha, the mass partition has weights

Vj=Uji<j(1Ui),UjBeta(1α,θ+jα)V_j = U_j \prod_{i<j}(1 - U_i), \quad U_j \sim \mathrm{Beta}(1-\alpha, \theta + j\alpha)

and drawing i.i.d. samples from the resulting random measure induces the Ewens–Pitman random partition (Favaro et al., 2016, Ho et al., 2018).

3. Compound Poisson Interpretations

Ewens–Pitman partitions admit an interpretation as mixtures of compound Poisson sampling models (Dolera et al., 2021):

  • For α=0\alpha = 0 (Ewens), block counts correspond to conditioning the total size of a log-series compound Poisson sample (LS-CPSM).
  • For general α(0,1)\alpha \in (0,1), block counts arise as mixtures (in zz) over negative-Binomial compound Poisson samples (NB-CPSM), with the mixing variable zz a product of a Gamma and a scaled Mittag–Leffler (generalized stable) variable.

Specifically, setting z=Gθ+αn,1Sα,θz = G_{\theta+\alpha n,1} S_{\alpha,\theta}, with GG independent Gamma and Sα,θS_{\alpha,\theta} a random variable with density fSα,θ(s)sθ/αfα(s)f_{S_{\alpha,\theta}}(s) \propto s^{\theta/\alpha} f_\alpha(s) (where fαf_\alpha is the positive α\alpha-stable density), the EP(α,θ)(\alpha,\theta) partition law coincides with the NB-CPSM(α,z)(\alpha,z) marginal, and the number of blocks KnK_n concentrates (almost surely) as Kn/nαSα,θK_n / n^\alpha \to S_{\alpha,\theta} (Dolera et al., 2021).

This compound Poisson approach seamlessly yields asymptotic results, closed-form formulas, and generalizations to Poisson–Kingman partitions.

4. Asymptotics: Laws of Large Numbers, Fluctuations, and Limit Theorems

For fixed (α,θ)(\alpha,\theta), the key scaling regimes are as follows (Contardi et al., 2024, Bercu et al., 2024, Tsukuda, 2020):

  • For α=0\alpha = 0 (Ewens): KnθlognK_n \sim \theta \log n with Gaussian central limit fluctuations.
  • For α(0,1)\alpha \in (0,1): Kn/nαSα,θK_n / n^\alpha \to S_{\alpha,\theta} almost surely, where Sα,θS_{\alpha,\theta} is α\alpha-Mittag–Leffler distributed; Var(Kn)C(α,θ)nα\text{Var}(K_n) \sim C(\alpha, \theta) n^\alpha.
  • CLT: (KnE[Kn])/Var(Kn)N(0,1)(K_n - E[K_n]) / \sqrt{Var(K_n)} \to N(0,1).
  • LIL: Law of the iterated logarithm applies to the centered, scaled block counts (Bercu et al., 2024).
  • Higher moments: E[Knr]=nαrE[Sα,θr]r(r1)α2θnα(r1)E[Sα,θr1]+O(nα(r2))\mathbb{E}[K_n^r] = n^{\alpha r} \mathbb{E}[S_{\alpha,\theta}^r] - \frac{r(r-1)\alpha}{2\theta} n^{\alpha(r-1)} \mathbb{E}[S_{\alpha,\theta}^{r-1}] + O(n^{\alpha(r-2)}) (Tsukuda, 2020).

For microclustering applications and scalable settings, scaling θ\theta linearly with nn (i.e., θ=λn\theta = \lambda n) yields a "microclustering" regime where the number of blocks and the counts of blocks of any fixed size both grow linearly with nn while the maximal cluster size remains o(n)o(n) (Beraha et al., 24 Jul 2025, Contardi et al., 2024).

5. Large Deviations, Moderate Deviations, and Concentration

Iα(x)=suptR{xtΛα(t)}I_\alpha(x) = \sup_{t \in \mathbb{R}} \{xt - \Lambda_\alpha(t)\}

where Λα(t)\Lambda_\alpha(t) is a logarithmic transform involving the Mittag–Leffler function (Bercu et al., 9 Mar 2025, Favaro et al., 2014). An explicit sharp concentration inequality describes the probability of KnK_n deviating from its mean.

  • Moderate deviations: Intermediate scaling regimes, for sequences BnB_n with (logn)1αBnn1α(\log n)^{1-\alpha} \ll B_n \ll n^{1-\alpha}, yield corresponding rate functions Iα(x)I_\alpha(x) providing precise transition descriptors between CLT and LDP scales (Favaro et al., 2016).
  • Block frequencies: Analogous large and moderate deviation principles hold for counts Mr,nM_{r,n} of blocks of fixed size rr (Favaro et al., 2014, Favaro et al., 2016).
  • Conditional LDP/MDP: Conditioning on partially observed partitions, the deviation rate functions remain unchanged – the initial sample's impact is negligible at large nn or sample-augmentation settings (Favaro et al., 2014, Favaro et al., 2016).

6. Representation Theory and Algebraic Structures

Ewens–Pitman partitions are characterized as non-extreme harmonic functions on the Kingman branching graph (infinite Young lattice) and are tightly linked to the combinatorics of symmetric group characters and interpolation polynomials (Greve, 6 Mar 2025). The partition probabilities admit explicit expansion in terms of Sheffer polynomial sequences and Riordan array sums, yielding effective computational methods for summary statistics, moments, and marginals. For example, the marginal probability of Kn=kK_n = k or joint factorial moments of block counts can be written as closed-form coefficients in generalized Stirling number expansions obtainable via generating function and Riordan array technology.

This algebraic approach both encapsulates the full system of sampling-consistent marginals and facilitates symbolic computations (Greve, 6 Mar 2025).

7. Applications, Biological and Statistical Significance

  • Population genetics: Ewens–Pitman partitions generalize the Ewens sampling formula (ESF) for modeling allelic diversity and mutation structures in finite populations (Giordano et al., 2019).
  • Bayesian nonparametrics: The Pitman–Yor process induced partitions serve as priors for clustering in Dirichlet and stable process mixture models—central in Bayesian statistics and machine learning.
  • Entity resolution: Microclustering variants (scaling θ\theta with nn) underpin scalable clustering and de-duplication/identity resolution with provable guarantees on block size and count growth rates (Beraha et al., 24 Jul 2025).
  • Species sampling and discovery probabilities: Tail asymptotics and conditional LDPs enable calculation of discovery probabilities, facilitating design and inference in ecological, genomic, and risk-assessment contexts (Favaro et al., 2014).
  • Random trees and fragmentation: Fragmentation and coagulation operations on Ewens–Pitman partitions generate Markov chains and random trees (e.g., continuum random trees), with the scaled block-size limits governed by Mittag–Leffler and stable laws (Ho et al., 2018, Mano, 2013).

8. Summary Table: Core Properties of Ewens–Pitman Partitions

Property α=0\alpha=0 (Ewens) 0<α<10<\alpha<1 (Pitman–Yor)
Block count growth KnθlognK_n \sim \theta \log n Kn/nαSα,θK_n / n^\alpha \to S_{\alpha,\theta} a.s.
Block size distribution weak Dirichlet/multinomial Power law; Sibuya law
Compound Poisson repr. Log-series mixing (LS-CPSM) NB-CPSM, mixed by ML law
Large deviation rate Explicit, convex analytic Given by Λα(t)\Lambda_\alpha(t)
Integrable structure Stirling/Riordan (binomial) Generalized Stirling, Riordan
Microclustering regime KnnK_n \propto n iff θn\theta\propto n KnnK_n \propto n iff θn\theta\propto n

These properties summarize both the classical and non-standard regimes and their implications for stochastic modeling and asymptotic analysis.


References:

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Ewens–Pitman Partitions.