Papers
Topics
Authors
Recent
Search
2000 character limit reached

Conditional Similarity-Sensitive Entropy

Updated 13 January 2026
  • Conditional similarity-sensitive entropy is an extension of classical entropy that integrates a user-specified similarity kernel to capture fuzzy similarities in the state space.
  • The framework recovers standard Shannon entropy for trivial kernels while unveiling new behaviors, such as the failure of conventional conditional monotonicity with fuzzy kernels.
  • It underpins rigorous coarse-graining and data-processing inequalities, offering practical insights into the impact of kernel choices on entropy measures.

Conditional similarity-sensitive entropy generalizes classical conditional entropy to situations where a user-specified similarity structure is imposed on the state space. Formally, it is defined within the framework of a kernelled probability space, which incorporates a symmetric, measurable similarity kernel KK on a base space, yielding an entropy functional HKH_K that encodes not only uncertainty but also the similarity geometry prescribed by KK. The conditional form, HK(XY)H_K(X\mid Y), measures the average unpredictability of XX given YY, as perceived through KK. This framework recovers standard Shannon entropy and mutual information for trivial (identity or partition) kernels, while enabling new behaviors when KK admits partial similarities (“fuzzy” kernels), including possible failure of classical inequalities such as conditional monotonicity. This construction supports rigorous data-processing inequalities and coarse-graining rules via the law-induced kernel formalism, as developed in the measure-theoretic setting by Leinster, Roff, and Miller (Miller, 6 Jan 2026).

1. Kernelled Probability Spaces and Similarity-Sensitive Entropy

Let (Ω,F,μ)(\Omega, \mathcal{F}, \mu) be a probability space. A similarity kernel is a symmetric, measurable function K:Ω×Ω[0,1]K: \Omega \times \Omega \to [0,1] satisfying K(ω,ω)=1K(\omega, \omega) = 1 for all ω\omega and symmetry K(ω,ω)=K(ω,ω)K(\omega, \omega') = K(\omega', \omega). The “typicality” function is

τ(ω):=ΩK(ω,ω)dμ(ω),\tau(\omega):= \int_\Omega K(\omega, \omega')\,d\mu(\omega'),

which must satisfy τ(ω)>0\tau(\omega) > 0 for μ\mu-almost every ω\omega. The quadruple (Ω,F,μ,K)(\Omega, \mathcal{F}, \mu, K) is called a kernelled probability space.

Similarity-sensitive entropy is then defined as

HK(μ):=Ωlog[τ(ω)]dμ(ω),H_K(\mu):= -\int_\Omega \log[\tau(\omega)]\,d\mu(\omega),

which, in the finite state case (with pp a pmf), reduces to

HK(p)=xXpxlog[(Kp)x],H_K(p) = -\sum_{x\in\mathcal{X}} p_x \log\left[(Kp)_x\right],

where (Kp)x=xKxxpx(Kp)_x = \sum_{x'} K_{x x'} p_{x'}.

If KK dominates another kernel KK' pointwise (KKK \geq K' μμ\mu\otimes\mu-almost everywhere), then HK(μ)HK(μ)H_K(\mu) \geq H_{K'}(\mu). This kernel monotonicity reflects the effect of similarity softening.

2. Conditional Similarity-Sensitive Entropy and Mutual Information

Given a joint law P\mathbb{P} on (X,Y)(X, Y), define the conditional law of XX given Y=yY=y as μXY=y\mu_{X\mid Y = y}. For each yy, the conditional typicality is

τy(ω):=ΩXK(ω,ω)dμXY=y(ω).\tau_y(\omega):= \int_{\Omega_X} K(\omega, \omega')\,d\mu_{X\mid Y=y}(\omega').

The pointwise conditional entropy is

HK(XY=y):=ΩXlog[τy(ω)]dμXY=y(ω).H_K(X\mid Y = y) := -\int_{\Omega_X} \log[\tau_y(\omega)]\,d\mu_{X\mid Y=y}(\omega).

The averaged conditional entropy is

HK(XY):=YHK(XY=y)dPY(y),H_K(X\mid Y) := \int_{\mathcal{Y}} H_K(X\mid Y=y)\, d\mathbb{P}_Y(y),

while the associated mutual information is

IK(X;Y):=HK(X)HK(XY).I_K(X; Y) := H_K(X) - H_K(X\mid Y).

For finite X,YX, Y and kernel KK, these definitions reduce to sums as in standard discrete information theory.

3. Coarse-Graining, Induced Kernels, and Data-Processing

Given a measurable map f:ΩYf: \Omega \rightarrow \mathcal{Y} with pushforward law ν=f#μ\nu = f_\# \mu, define the law-induced kernel KY,μK^{\mathcal{Y}, \mu} on Y\mathcal{Y} by

KY,μ(y,y)=esssup(ω,ω)μyμyK(ω,ω),KY,μ(y,y)=1,K^{\mathcal{Y}, \mu}(y, y') = \operatorname{ess}\sup_{(\omega, \omega') \sim \mu_y \otimes \mu_{y'}} K(\omega, \omega'), \qquad K^{\mathcal{Y}, \mu}(y, y) = 1,

where {μy}\{\mu_y\} is a disintegration of μ\mu along ff. The pullback Kf,μ(ω,ω):=KY,μ(f(ω),f(ω))K^{f, \mu}(\omega, \omega') := K^{\mathcal{Y}, \mu}(f(\omega), f(\omega')) dominates KK.

The similarity-sensitive coarse-graining inequality is

HK(μ)HKY,μ(ν),H_K(\mu) \geq H_{K^{\mathcal{Y},\mu}}(\nu),

and more generally, a data-processing inequality holds for all channels realized via Markov kernels: mapping XX to YY, the entropy is monotonic under the induced law: HKY,μ(ν)HK(μ).H_{K^{\mathcal{Y},\mu}}(\nu) \leq H_K(\mu). For the conditional case, if W=f(X)W = f(X), then

HK(XY)HKW(WY),H_K(X\mid Y) \geq H_{K^W}(W\mid Y),

with KWK^W the induced kernel on WW.

4. Reduction to Classical Entropy for Trivial Kernels

For the identity (delta) kernel, K(ω,ω)=1{ω=ω}K(\omega,\omega') = \mathbf{1}\{\omega = \omega'\}, one recovers Shannon entropy: (Kp)x=px(Kp)_x = p_x, and HK(p)=H(X)H_K(p) = H(X). For partition kernels, where KK encodes block structure via an equivalence relation, the entropy and mutual information coincide with the entropy and mutual information of the induced coarse variable ZZ. Formally: HK(X)=H(Z),HK(XY)=H(ZY),IK(X;Y)=I(Z;Y).H_K(X) = H(Z), \qquad H_K(X \mid Y) = H(Z \mid Y), \qquad I_K(X; Y) = I(Z; Y). Monotonicity is recovered: HK(XY)HK(X)H_K(X \mid Y) \leq H_K(X) and IK(X;Y)0I_K(X; Y) \geq 0.

5. Phenomena Unique to Fuzzy Kernels

When KK is fuzzy (e.g., non-block-diagonal with entries in (0,1)(0,1)), classical monotonicity properties can fail. For suitable X={0,1,2}\mathcal{X} = \{0,1,2\} and Y={0,1}\mathcal{Y} = \{0,1\}, with similarity values K0,2=1/2K_{0,2} = 1/2, K1,2=1K_{1,2} = 1, and some joint distribution pXYp_{XY}, it is possible that

HK(XY)>HK(X).H_K(X \mid Y) > H_K(X).

This marks a key departure from the classical framework and demonstrates that, for genuinely fuzzy kernels, conditional similarity-sensitive entropy is not necessarily monotone under conditioning.

By contrast, in the strictly binary-state case with K=(1k k1)K = \begin{pmatrix}1 & k\ k & 1\end{pmatrix} and k[0,1]k \in [0,1], the entropy functional HK(p)H_K(p) is strictly concave, ensuring HK(XY)HK(X)H_K(X \mid Y) \leq H_K(X) for all joint laws.

6. Structural Invariants and Isomorphism Properties

The law of the typicality random variable τ(X)\tau(X) under μ\mu is an isomorphism invariant of the kernelled probability space (Ω,μ,K)(\Omega, \mu, K). For example, if τ\tau is non-atomic or admits infinitely many values, KK cannot be a partition kernel corresponding to finitely many equivalence classes. This invariant plays a role in distinguishing the structural complexity of different kernelled spaces.

7. Summary and Connections

Conditional similarity-sensitive entropy is defined by disintegrating the law of XX given YY, calculating a pointwise entropy relative to the original similarity kernel, and averaging over YY. The framework supports universal coarse-graining and data-processing inequalities via the law-induced kernel. For identity or partition kernels, the classical Shannon theory is recovered exactly. For general KK, new phenomena such as potential failure of conditional monotonicity arise, underscoring the subtleties introduced by non-trivial similarity structures (Miller, 6 Jan 2026).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Conditional Similarity-Sensitive Entropy.