Conditional Similarity-Sensitive Entropy

Updated 13 January 2026

Conditional similarity-sensitive entropy is an extension of classical entropy that integrates a user-specified similarity kernel to capture fuzzy similarities in the state space.
The framework recovers standard Shannon entropy for trivial kernels while unveiling new behaviors, such as the failure of conventional conditional monotonicity with fuzzy kernels.
It underpins rigorous coarse-graining and data-processing inequalities, offering practical insights into the impact of kernel choices on entropy measures.

Conditional similarity-sensitive entropy generalizes classical conditional entropy to situations where a user-specified similarity structure is imposed on the state space. Formally, it is defined within the framework of a kernelled probability space, which incorporates a symmetric, measurable similarity kernel $K$ on a base space, yielding an entropy functional $H_K$ that encodes not only uncertainty but also the similarity geometry prescribed by $K$ . The conditional form, $H_K(X\mid Y)$ , measures the average unpredictability of $X$ given $Y$ , as perceived through $K$ . This framework recovers standard Shannon entropy and mutual information for trivial (identity or partition) kernels, while enabling new behaviors when $K$ admits partial similarities (“fuzzy” kernels), including possible failure of classical inequalities such as conditional monotonicity. This construction supports rigorous data-processing inequalities and coarse-graining rules via the law-induced kernel formalism, as developed in the measure-theoretic setting by Leinster, Roff, and Miller (Miller, 6 Jan 2026).

1. Kernelled Probability Spaces and Similarity-Sensitive Entropy

Let $(\Omega, \mathcal{F}, \mu)$ be a probability space. A similarity kernel is a symmetric, measurable function $K: \Omega \times \Omega \to [0,1]$ satisfying $K(\omega, \omega) = 1$ for all $\omega$ and symmetry $K(\omega, \omega') = K(\omega', \omega)$ . The “typicality” function is

$\tau(\omega):= \int_\Omega K(\omega, \omega')\,d\mu(\omega'),$

which must satisfy $\tau(\omega) > 0$ for $\mu$ -almost every $\omega$ . The quadruple $(\Omega, \mathcal{F}, \mu, K)$ is called a kernelled probability space.

Similarity-sensitive entropy is then defined as

$H_K(\mu):= -\int_\Omega \log[\tau(\omega)]\,d\mu(\omega),$

which, in the finite state case (with $p$ a pmf), reduces to

$H_K(p) = -\sum_{x\in\mathcal{X}} p_x \log\left[(Kp)_x\right],$

where $(Kp)_x = \sum_{x'} K_{x x'} p_{x'}$ .

If $K$ dominates another kernel $K'$ pointwise ( $K \geq K'$ $\mu\otimes\mu$ -almost everywhere), then $H_K(\mu) \geq H_{K'}(\mu)$ . This kernel monotonicity reflects the effect of similarity softening.

2. Conditional Similarity-Sensitive Entropy and Mutual Information

Given a joint law $\mathbb{P}$ on $(X, Y)$ , define the conditional law of $X$ given $Y=y$ as $\mu_{X\mid Y = y}$ . For each $y$ , the conditional typicality is

$\tau_y(\omega):= \int_{\Omega_X} K(\omega, \omega')\,d\mu_{X\mid Y=y}(\omega').$

The pointwise conditional entropy is

$H_K(X\mid Y = y) := -\int_{\Omega_X} \log[\tau_y(\omega)]\,d\mu_{X\mid Y=y}(\omega).$

The averaged conditional entropy is

$H_K(X\mid Y) := \int_{\mathcal{Y}} H_K(X\mid Y=y)\, d\mathbb{P}_Y(y),$

while the associated mutual information is

$I_K(X; Y) := H_K(X) - H_K(X\mid Y).$

For finite $X, Y$ and kernel $K$ , these definitions reduce to sums as in standard discrete information theory.

3. Coarse-Graining, Induced Kernels, and Data-Processing

Given a measurable map $f: \Omega \rightarrow \mathcal{Y}$ with pushforward law $\nu = f_\# \mu$ , define the law-induced kernel $K^{\mathcal{Y}, \mu}$ on $\mathcal{Y}$ by

$K^{\mathcal{Y}, \mu}(y, y') = \operatorname{ess}\sup_{(\omega, \omega') \sim \mu_y \otimes \mu_{y'}} K(\omega, \omega'), \qquad K^{\mathcal{Y}, \mu}(y, y) = 1,$

where $\{\mu_y\}$ is a disintegration of $\mu$ along $f$ . The pullback $K^{f, \mu}(\omega, \omega') := K^{\mathcal{Y}, \mu}(f(\omega), f(\omega'))$ dominates $K$ .

The similarity-sensitive coarse-graining inequality is

$H_K(\mu) \geq H_{K^{\mathcal{Y},\mu}}(\nu),$

and more generally, a data-processing inequality holds for all channels realized via Markov kernels: mapping $X$ to $Y$ , the entropy is monotonic under the induced law: $H_{K^{\mathcal{Y},\mu}}(\nu) \leq H_K(\mu).$ For the conditional case, if $W = f(X)$ , then

$H_K(X\mid Y) \geq H_{K^W}(W\mid Y),$

with $K^W$ the induced kernel on $W$ .

4. Reduction to Classical Entropy for Trivial Kernels

For the identity (delta) kernel, $K(\omega,\omega') = \mathbf{1}\{\omega = \omega'\}$ , one recovers Shannon entropy: $(Kp)_x = p_x$ , and $H_K(p) = H(X)$ . For partition kernels, where $K$ encodes block structure via an equivalence relation, the entropy and mutual information coincide with the entropy and mutual information of the induced coarse variable $Z$ . Formally: $H_K(X) = H(Z), \qquad H_K(X \mid Y) = H(Z \mid Y), \qquad I_K(X; Y) = I(Z; Y).$ Monotonicity is recovered: $H_K(X \mid Y) \leq H_K(X)$ and $I_K(X; Y) \geq 0$ .

5. Phenomena Unique to Fuzzy Kernels

When $K$ is fuzzy (e.g., non-block-diagonal with entries in $(0,1)$ ), classical monotonicity properties can fail. For suitable $\mathcal{X} = \{0,1,2\}$ and $\mathcal{Y} = \{0,1\}$ , with similarity values $K_{0,2} = 1/2$ , $K_{1,2} = 1$ , and some joint distribution $p_{XY}$ , it is possible that

$H_K(X \mid Y) > H_K(X).$

This marks a key departure from the classical framework and demonstrates that, for genuinely fuzzy kernels, conditional similarity-sensitive entropy is not necessarily monotone under conditioning.

By contrast, in the strictly binary-state case with $K = \begin{pmatrix}1 & k\ k & 1\end{pmatrix}$ and $k \in [0,1]$ , the entropy functional $H_K(p)$ is strictly concave, ensuring $H_K(X \mid Y) \leq H_K(X)$ for all joint laws.

6. Structural Invariants and Isomorphism Properties

The law of the typicality random variable $\tau(X)$ under $\mu$ is an isomorphism invariant of the kernelled probability space $(\Omega, \mu, K)$ . For example, if $\tau$ is non-atomic or admits infinitely many values, $K$ cannot be a partition kernel corresponding to finitely many equivalence classes. This invariant plays a role in distinguishing the structural complexity of different kernelled spaces.

7. Summary and Connections

Conditional similarity-sensitive entropy is defined by disintegrating the law of $X$ given $Y$ , calculating a pointwise entropy relative to the original similarity kernel, and averaging over $Y$ . The framework supports universal coarse-graining and data-processing inequalities via the law-induced kernel. For identity or partition kernels, the classical Shannon theory is recovered exactly. For general $K$ , new phenomena such as potential failure of conditional monotonicity arise, underscoring the subtleties introduced by non-trivial similarity structures (Miller, 6 Jan 2026).

Markdown Report Issue Upgrade to Chat

References (1)

Similarity-Sensitive Entropy: Induced Kernels and Data-Processing Inequalities (2026)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Conditional Similarity-Sensitive Entropy.