Persistence Silhouettes in TDA

Updated 19 January 2026

Persistence Silhouettes are functional summaries of persistence diagrams, constructed as weighted averages of piecewise-linear tent functions that capture topological feature lifetimes.
They offer a highly regularized summary with provable stability guarantees and Lipschitz continuity, enabling robust statistical inference and hypothesis testing.
They are efficiently computed and seamlessly integrated into machine learning pipelines, supporting applications in classification, graph analysis, and functional data analysis.

A persistence silhouette is a functional summary of a persistence diagram—a central construct in topological data analysis (TDA) that encodes the birth and death of topological features as a multiset of points in the plane. The silhouette transforms this diagram into a single real-valued, piecewise-linear function using a weighted average of tent functions associated with each persistence pair, controlled by an explicit weighting parameter. This construction offers a highly regularized summary of topological information, with provable statistical properties and stability guarantees, and serves as a foundational tool for statistical inference and machine learning pipelines in TDA (Chazal et al., 2013, Berry et al., 2018, Segovia-Dominguez et al., 2024).

1. Formal Definition and Construction

Given a persistence diagram $D = \{(b_i, d_i)\}_{i=1}^N$ comprising $N$ points $(b_i, d_i)$ with birth and death times, the tent (triangle) function associated to $(b_i, d_i)$ is

$A_i(t) = \begin{cases} t - b_i, & t \in [b_i, \frac{b_i + d_i}{2}], \ d_i - t, & t \in [\frac{b_i + d_i}{2}, d_i], \ 0, & \text{otherwise}. \end{cases}$

Each $A_i$ is continuous, piecewise-linear, and $1$-Lipschitz with respect to $t$ .

A nonnegative weight function $w: \mathbb{R}^2 \to [0, \infty)$ , commonly of the form $w_p(b, d) = (d - b)^p$ for $p > 0$ , is assigned to each persistence pair. The general weighted silhouette $\mathrm{Sil}_w: D \to C(\mathbb{R})$ is defined as

$\mathrm{Sil}_w(t) = \frac{\sum_{i=1}^N w(b_i, d_i) A_i(t)}{\sum_{i=1}^N w(b_i, d_i)}.$

The most frequent choice is $p$ -silhouette: $\mathrm{Sil}_p(t) = \frac{\sum_{i=1}^N (d_i - b_i)^p A_i(t)}{\sum_{i=1}^N (d_i - b_i)^p}.$ This function is always $1$-Lipschitz in $t$ , and the weights allow continuous interpolation between emphasizing all features equally ( $p$ small) and focusing on the most persistent ( $p$ large) (Chazal et al., 2013, Berry et al., 2018, Segovia-Dominguez et al., 2024).

2. The Silhouette in the Landscape-Unification Framework

The silhouette belongs to the class of functional summaries $\mathcal{F}: \mathcal{D} \to \mathcal{B}_F$ , mapping diagrams $\mathcal{D}$ into a Banach space $\mathcal{B}_F$ of real functions (Berry et al., 2018). Unlike the persistence landscape, which forms a sequence $\lambda_D(k, t)$ of the $k$ th-largest tent values at $t$ , the silhouette is a single function, which makes it the minimal-dimensional continuous functional summary of a diagram. This property enables seamless application of functional data analysis and machine learning techniques, including averaging, hypothesis testing, and classification in the space of functions.

The parameter $p$ gives fine control over the information retained:

As $p \to 0$ , $\mathrm{Sil}_p$ approaches the unweighted mean of the $A_i$ .
As $p \to \infty$ , $\mathrm{Sil}_p$ converges to the tent associated to the persistence pair with the longest lifetime.

This construction provides a consistent interface for statistical learning, clustering, and permutation testing on diagrams via their functional image (Berry et al., 2018).

3. Statistical Properties and Stochastic Convergence

The silhouette enjoys rigorous stochastic-process theory under mild conditions:

Uniform boundedness: $\mathrm{Sil}_p(t)$ is uniformly bounded over $t$ , since each tent is bounded by half the corresponding lifetime (Berry et al., 2018).
Lipschitz equicontinuity: Since each $A_i$ is 1-Lipschitz and normalization by positive weights preserves this property, the family $\{\mathrm{Sil}_p(D; t)\}_{D}$ is equicontinuous. This ensures strong uniform laws of large numbers and consistency:

$\sup_{t \in [0, T]} |\overline{\mathrm{Sil}}_n(t) - \Psi_p(t)| \to 0 \quad \text{a.s.},$

where $\overline{\mathrm{Sil}}_n(t)$ is the sample mean silhouette and $\Psi_p(t) = \mathbb{E}[\mathrm{Sil}_p(t)]$ is the population mean.

Central limit theorem (CLT): The empirical process

$G_n(t) = \sqrt{n} \left( \overline{\mathrm{Sil}}_n(t) - \Psi_p(t) \right)$

converges weakly to a mean-zero Gaussian process indexed by $t$ , with explicit rates of convergence (Chazal et al., 2013).

Bootstrap consistency: Bootstrap samples of $\{\mathrm{Sil}_{p, j}(t)\}$ yield valid $L_\infty$ -confidence bands for $\Psi_p(t)$ with asymptotic coverage $1 - \alpha$ up to order $(\log n)^{1/2} n^{-1/8}$ (Chazal et al., 2013, Berry et al., 2018). Both uniform and studentized (variable-width) confidence bands may be constructed.

These results allow direct application of permutation tests and prediction regions in the function space, using classical metrics such as $L_2$ and $L_\infty$ (Berry et al., 2018).

4. Stability and Robustness

The silhouette inherits strong stability properties from persistence landscapes:

Lipschitz stability: For diagrams $D, D'$ , the uniform distance between their silhouettes is bounded by the bottleneck distance:

$\|\mathrm{Sil}_p(D) - \mathrm{Sil}_p(D')\|_\infty \le d_B(D, D'),$

where $d_B$ denotes the standard bottleneck distance (Chazal et al., 2013, Segovia-Dominguez et al., 2024). This property implies robustness to noise and small perturbations in the data.

All stability results proved for landscapes (in particular stability w.r.t. $p$ -Wasserstein distance) carry over directly to silhouettes (Chazal et al., 2013, Segovia-Dominguez et al., 2024).
EMP framework extension: In the Effective Multidimensional Persistence (EMP) extension, one computes families of silhouettes across slices of multidimensional parameter grids. The EMP silhouette inherits the single-parameter silhouette’s stability: the sum of uniform deviations across slices is bounded above by the corresponding sum of Wasserstein distances between diagrams (Segovia-Dominguez et al., 2024).

5. Algorithmic Implementation

The silhouette can be computed as follows:

Input: Persistence diagram $D$ , weight function $\omega$ , evaluation grid $\{t_1, \ldots, t_N\}$ .
Feature computation: For each $j=1, \ldots, |D|$ , compute lifetime $\ell_j = d_j - b_j$ , set $w_j = \omega(\ell_j)$ .
Tent evaluation: For each $t_k$ , compute $A_j(t_k) = \max\{0, \min(t_k - b_j, d_j - t_k)\}$ .
Weighted sum: For each $t_k$ , form numerator $n_k = \sum_j w_j A_j(t_k)$ , denominator $d = \sum_j w_j$ , then output $s_k = n_k/d$ .
Complexity: The algorithm requires $O(m N)$ flops for $m$ features and $N$ grid points (Berry et al., 2018).

In the EMP framework, this computation is repeated across $m$ slices, and resulting silhouette vectors are assembled into a matrix or higher-dimensional array (Segovia-Dominguez et al., 2024).

6. Applied Usage and Empirical Results

Functional inference and classification: Silhouettes have been used as features in $k$ -nearest neighbor classification, for example in the analysis of simulated Gleason histology, yielding a test error of 11.75% on a four-class task with 400 held-out regions of interest (Berry et al., 2018).
Two-sample testing: The silhouette fits directly into permutation-test frameworks for comparing two populations via $L_2$ or $L_\infty$ distances between sample mean silhouettes (Berry et al., 2018).
Machine learning on graphs: EMP silhouettes have been evaluated as input features for standard classifiers (Random Forest, SVM, CNN) on benchmark graph classification datasets (e.g., BZR_MD, COX2_MD, DHFR_MD, MUTAG, REDDIT-B), achieving competitive or state-of-the-art accuracy (Segovia-Dominguez et al., 2024). For instance, combined $H_0$ - and $H_1$ -EMP silhouettes gave 88.1% accuracy on MUTAG and 88.6% on REDDIT-B.
Statistical rigour: All such applications benefit from the silhouette’s stability and the availability of functional CLTs, uniform confidence bands, and valid asymptotic inference (Chazal et al., 2013, Berry et al., 2018).

7. Limitations and Interpretive Considerations

Information compression: By averaging over all tent functions, the silhouette summarizes a persistence diagram as a single function, potentially losing multimodal information captured in higher levels ( $k \geq 2$ ) of the persistence landscape. Thus, secondary and tertiary modes of feature persistence may be obscured (Chazal et al., 2013).
Weight sensitivity: Selection of the weight parameter $p$ (or general weight function $w$ ) directly impacts the prominence given to features of varying persistence. Empirical tuning or application-specific guidance may be required (Chazal et al., 2013, Berry et al., 2018).
Implementation: For diagrams with only short-lived features (nearly zero lifetime), normalization may be unstable; thresholds on lifetimes or addition of a small $\epsilon$ may be necessary (Berry et al., 2018).

In summary, the persistence silhouette provides a one-Lipschitz, single-function summary of a persistence diagram, interpolating between uniform averaging of topological features and maximal emphasis on the longest bars. It is theoretically underpinned by stability, stochastic process convergence, and direct applicability to hypothesis testing and machine learning tasks. The silhouette integrates elegantly into frameworks for both single- and multi-parameter persistent homology, supporting a broad spectrum of statistical and computational pipelines (Chazal et al., 2013, Berry et al., 2018, Segovia-Dominguez et al., 2024).