Distributed Information Bottleneck

Updated 30 January 2026

Distributed Information Bottleneck is a framework for multi-terminal representation learning that maximizes target relevance under complexity constraints.
It formalizes multi-encoder compression via stochastic mappings and offers exact single-letter characterizations for discrete and Gaussian sources.
Practical optimization using Blahut–Arimoto and variational methods achieves near-optimal performance in applications like C-RAN and federated learning.

The Distributed Information Bottleneck (Dist-IB) is a multi-terminal extension of the classical Information Bottleneck methodology. Dist-IB provides a rigorous framework for distributed representation learning and multi-terminal source coding under log-loss fidelity constraints, enabling multiple encoders to separately compress their observations while collectively preserving maximal relevant information about a target variable. This framework yields fundamental relevance–complexity regions for both discrete and vector Gaussian sources, admits practical optimization via variational and Blahut–Arimoto algorithms, and connects directly to problems in C-RAN, federated inference, and interpretable deep learning (Aguerri et al., 2017, Aguerri et al., 2018, Zaidi et al., 2020, Song et al., 2023, Xu et al., 2022, Murphy et al., 2022, Vera et al., 2016).

1. Formal Problem Definition

The Dist-IB model considers $K$ encoders, each observing a random variable $X_k$ associated with a latent target $Y$ . The source $(Y,X_1,\ldots,X_K)$ is typically assumed to be jointly distributed, often via conditional independence: $P_{Y,X_1,\ldots,X_K}(y,x_1,\ldots,x_K) = P_Y(y)\prod_{k=1}^K P_{X_k|Y}(x_k|y)$ (Aguerri et al., 2017, Aguerri et al., 2018, Zaidi et al., 2020). Each encoder applies a stochastic map $P_{U_k|X_k}$ producing a representation $U_k$ , communicating over links of rate $R_k = I(X_k;U_k)$ . The decoder aggregates $(U_1,\ldots,U_K)$ for inference or prediction about $Y$ , judging performance by $I(U_1,\ldots,U_K;Y)$ (relevance).

The core optimization is:

Objective: Maximize $I(U_1,\ldots,U_K;Y)$ (information preserved about $Y$ )
Constraints: $I(X_k;U_k) \leq R_k$ , $\forall k$ (rate/complexity constraints per encoder)

For vector Gaussian models, the framework employs linear-Gaussian test channels $U_k = A_k X_k + Z_k$ , with $Z_k \sim \mathcal{N}(0,\Sigma_{z_k})$ (Aguerri et al., 2017, Aguerri et al., 2018, Zaidi et al., 2020).

The central trade-off is encapsulated in the Lagrangian (sum-rate version):

$L_s\left(\{P_{U_k|X_k}\}\right) = I(Y;U_1,\ldots,U_K) - s \sum_{k=1}^K I(X_k;U_k)$

Optimizing $L_s$ by varying $s$ traces the Pareto frontier between total relevance and total complexity.

2. Single-Letter Characterizations and Region Boundaries

Dist-IB admits exact single-letter region characterizations in both discrete and vector Gaussian settings (Aguerri et al., 2017, Aguerri et al., 2018, Zaidi et al., 2020):

Discrete Memoryless (CEO under log-loss):

For every $S \subseteq \{1,\ldots,K\}$ ,

$\Delta \leq \sum_{k \in S}[R_k - I(X_k;U_k|Y,T)] + I\left(Y;U_{S^c}|T\right)$

where $T$ is a time-sharing variable.

Vector Gaussian:

When $X_k = H_k Y + N_k$ , $N_k \sim \mathcal{N}(0,\Sigma_k)$ ,

$\Delta \leq \sum_{k \in S}\left[R_k + \ln |\!I - \Sigma_k^{1/2}\Omega_k\Sigma_k^{1/2}|\right] + \ln \left| I + \Sigma_Y^{1/2} \left(\sum_{k \in S^c} H_k^\dagger \Omega_k H_k\right) \Sigma_Y^{1/2} \right|$

for $0 \preceq \Omega_k \preceq \Sigma_k^{-1}$ .

This region generalizes the classical IB curve and demonstrates that distributed encoding can approach joint encoding performance in near-optimal regimes, provided the sources are conditionally independent given $Y$ .

3. Algorithmic Solutions: Blahut–Arimoto and Variational Methods

For both discrete and Gaussian models, the boundary of the Dist-IB region can be computed via:

Blahut–Arimoto Iterative Algorithms: Coordinate descent over encoder maps $P_{U_k|X_k}$ and auxiliary backward decoders $Q_{Y|U_k}$ , $Q_{Y|U_1,\ldots,U_K}$ (Aguerri et al., 2017, Aguerri et al., 2018, Zaidi et al., 2020). Each step alternates between posterior updates and encoder reparameterization, converging to a local optimum.
Variational Lower Bound (Distributed ELBO): Introduces neural parametrization $p_{\theta_k}(u_k|x_k)$ , variational decoders $Q_{Y|U_1,\ldots,U_K}$ , $Q_{Y|U_k}$ , and priors $Q_{U_k}$ , optimizing

$L_s^{\text{VB}} = \mathbb{E}_{X,Y}\mathbb{E}_{U_1,\ldots,U_K|X}\left[\log Q_{Y|U_1,\ldots,U_K}(Y|U_1,\ldots,U_K)\right] + s \sum_{k=1}^K \Big( \mathbb{E}[\log Q_{Y|U_k}(Y|U_k)] - D_{\text{KL}}(p_{U_k|X_k} \| Q_{U_k}) \Big)$

(Aguerri et al., 2018, Murphy et al., 2022, Zaidi et al., 2020). Neural encoders and decoders are jointly trained to maximize this bound using stochastic gradient optimization.

Gaussian Water-Filling: For MIMO and fading channel instances, optimal achievable rates are computed via water-filling solutions over the eigenvalues of the channel matrix (Song et al., 2023, Xu et al., 2022).

4. Extensions: Collaborative, Streaming, and Oblivious Relaying Models

Dist-IB generalizes to several important multi-terminal settings:

Collaborative Distributed IB (CDIB): Encoders interact to cooperatively describe $X_1$ , $X_2$ with auxiliary exchanges; relevance is measured at a third decoder (Vera et al., 2016). Inner and outer bounds quantify achievable $(R_1, R_2, \mu)$ regions with elaborate Markov constructions.
Sequential/Streaming Dist-IB: Online algorithms process sequential data samples $X_1, X_2, \ldots$ to construct $T_1, T_2, \ldots$ under cumulative rate constraints, with the global optimum approximated via forward and backward multi-pass updates utilizing Gaussian conditional IB eigen-analysis (Farajiparvar et al., 2018).
Oblivious Relaying (CRAN): Distributed radio heads compress locally observed signals for central processing, without knowledge of codebooks. Relay mappings are designed to maximize bottleneck rates subject to compression capacity constraints, as in "Distributed Information Bottleneck for a Primitive Gaussian Diamond MIMO Channel" (Song et al., 2023, Xu et al., 2022).

5. Interpretability and Scientific Applications

Dist-IB also features prominently in interpretable deep learning and scientific explanation (Murphy et al., 2022):

Feature Subset Selection: By monitoring the KL term $D_{\mathrm{KL}}(p_\theta(u_i|x_i)\|r(u_i))$ for each input component, Dist-IB automatically identifies and sequentially prunes less informative features, achieving principled subset selection without combinatorial search.
Explanatory Structure Discovery: Distributed bottleneck sweeps reveal the information architecture of complex systems, separating contributions of distinct input components and highlighting their respective relevance to predicting designated outputs. Applications include deconstruction of Boolean circuits and localization of plasticity in disordered materials.

6. Practical Implementations and Numerical Insights

Multiple practical schemes are reported for distributed relay and representation learning scenarios (Song et al., 2023, Xu et al., 2022):

Quantized Channel Inversion (QCI): Relays perform per-symbol zero-forcing and quantization of inverted noise levels, subsequently compressing quantized outputs under local rate constraints.
MMSE-Based Schemes: Linear MMSE estimation followed by additive Gaussian compression is provably near-optimal for finite bottleneck rates.
Empirical Performance: Numerical analyses confirm that simple relay mappings and symbol-by-symbol processing achieve rates closely tracking the theoretical Dist-IB upper bounds across a wide SNR and complexity spectrum, with QCI frequently saturating the upper bound and MMSE schemes approaching optimality at lower rates.

Model/Scenario	Algorithmic Approach	Empirical Behaviors
MIMO Diamond Channel	QCI, MMSE, Water-Filling	QCI saturates upper bound at ≈5+bits
Discrete Memoryless Sources	BA Iteration, NN-VIB	BA and VIB curves nearly coincide
Multi-view Deep Learning	Neural Variational VIB	KL trajectories select relevant inputs
Streaming Gaussian Data	Greedy + Two-pass IB	Global rate scales as $O(\log k)$

7. Connections to Multiterminal Source Coding and Learning Theory

Dist-IB is fundamentally connected to multiterminal CEO coding under log-loss, Wyner–Ahlswede–Körner problems, common reconstruction constraints, and distributed multi-view representation learning architectures. The rate–relevance tradeoffs derived from Dist-IB inform practical systems for C-RAN, federated learning, multi-sensor inference, and scientific feature attribution (Zaidi et al., 2020, Aguerri et al., 2017). The framework encapsulates both the foundational Shannon-theoretic limits and principled neural algorithmic solutions for distributed data analysis.