Papers
Topics
Authors
Recent
Search
2000 character limit reached

Distributed Information Bottleneck

Updated 30 January 2026
  • Distributed Information Bottleneck is a framework for multi-terminal representation learning that maximizes target relevance under complexity constraints.
  • It formalizes multi-encoder compression via stochastic mappings and offers exact single-letter characterizations for discrete and Gaussian sources.
  • Practical optimization using Blahut–Arimoto and variational methods achieves near-optimal performance in applications like C-RAN and federated learning.

The Distributed Information Bottleneck (Dist-IB) is a multi-terminal extension of the classical Information Bottleneck methodology. Dist-IB provides a rigorous framework for distributed representation learning and multi-terminal source coding under log-loss fidelity constraints, enabling multiple encoders to separately compress their observations while collectively preserving maximal relevant information about a target variable. This framework yields fundamental relevance–complexity regions for both discrete and vector Gaussian sources, admits practical optimization via variational and Blahut–Arimoto algorithms, and connects directly to problems in C-RAN, federated inference, and interpretable deep learning (Aguerri et al., 2017, Aguerri et al., 2018, Zaidi et al., 2020, Song et al., 2023, Xu et al., 2022, Murphy et al., 2022, Vera et al., 2016).

1. Formal Problem Definition

The Dist-IB model considers KK encoders, each observing a random variable XkX_k associated with a latent target YY. The source (Y,X1,,XK)(Y,X_1,\ldots,X_K) is typically assumed to be jointly distributed, often via conditional independence: PY,X1,,XK(y,x1,,xK)=PY(y)k=1KPXkY(xky)P_{Y,X_1,\ldots,X_K}(y,x_1,\ldots,x_K) = P_Y(y)\prod_{k=1}^K P_{X_k|Y}(x_k|y) (Aguerri et al., 2017, Aguerri et al., 2018, Zaidi et al., 2020). Each encoder applies a stochastic map PUkXkP_{U_k|X_k} producing a representation UkU_k, communicating over links of rate Rk=I(Xk;Uk)R_k = I(X_k;U_k). The decoder aggregates (U1,,UK)(U_1,\ldots,U_K) for inference or prediction about YY, judging performance by I(U1,,UK;Y)I(U_1,\ldots,U_K;Y) (relevance).

The core optimization is:

  • Objective: Maximize I(U1,,UK;Y)I(U_1,\ldots,U_K;Y) (information preserved about YY)
  • Constraints: I(Xk;Uk)RkI(X_k;U_k) \leq R_k, k\forall k (rate/complexity constraints per encoder)

For vector Gaussian models, the framework employs linear-Gaussian test channels Uk=AkXk+ZkU_k = A_k X_k + Z_k, with ZkN(0,Σzk)Z_k \sim \mathcal{N}(0,\Sigma_{z_k}) (Aguerri et al., 2017, Aguerri et al., 2018, Zaidi et al., 2020).

The central trade-off is encapsulated in the Lagrangian (sum-rate version):

Ls({PUkXk})=I(Y;U1,,UK)sk=1KI(Xk;Uk)L_s\left(\{P_{U_k|X_k}\}\right) = I(Y;U_1,\ldots,U_K) - s \sum_{k=1}^K I(X_k;U_k)

Optimizing LsL_s by varying ss traces the Pareto frontier between total relevance and total complexity.

2. Single-Letter Characterizations and Region Boundaries

Dist-IB admits exact single-letter region characterizations in both discrete and vector Gaussian settings (Aguerri et al., 2017, Aguerri et al., 2018, Zaidi et al., 2020):

  • Discrete Memoryless (CEO under log-loss):

For every S{1,,K}S \subseteq \{1,\ldots,K\},

ΔkS[RkI(Xk;UkY,T)]+I(Y;UScT)\Delta \leq \sum_{k \in S}[R_k - I(X_k;U_k|Y,T)] + I\left(Y;U_{S^c}|T\right)

where TT is a time-sharing variable.

  • Vector Gaussian:

When Xk=HkY+NkX_k = H_k Y + N_k, NkN(0,Σk)N_k \sim \mathcal{N}(0,\Sigma_k),

ΔkS[Rk+ln ⁣IΣk1/2ΩkΣk1/2]+lnI+ΣY1/2(kScHkΩkHk)ΣY1/2\Delta \leq \sum_{k \in S}\left[R_k + \ln |\!I - \Sigma_k^{1/2}\Omega_k\Sigma_k^{1/2}|\right] + \ln \left| I + \Sigma_Y^{1/2} \left(\sum_{k \in S^c} H_k^\dagger \Omega_k H_k\right) \Sigma_Y^{1/2} \right|

for 0ΩkΣk10 \preceq \Omega_k \preceq \Sigma_k^{-1}.

This region generalizes the classical IB curve and demonstrates that distributed encoding can approach joint encoding performance in near-optimal regimes, provided the sources are conditionally independent given YY.

3. Algorithmic Solutions: Blahut–Arimoto and Variational Methods

For both discrete and Gaussian models, the boundary of the Dist-IB region can be computed via:

  • Blahut–Arimoto Iterative Algorithms: Coordinate descent over encoder maps PUkXkP_{U_k|X_k} and auxiliary backward decoders QYUkQ_{Y|U_k}, QYU1,,UKQ_{Y|U_1,\ldots,U_K} (Aguerri et al., 2017, Aguerri et al., 2018, Zaidi et al., 2020). Each step alternates between posterior updates and encoder reparameterization, converging to a local optimum.
  • Variational Lower Bound (Distributed ELBO): Introduces neural parametrization pθk(ukxk)p_{\theta_k}(u_k|x_k), variational decoders QYU1,,UKQ_{Y|U_1,\ldots,U_K}, QYUkQ_{Y|U_k}, and priors QUkQ_{U_k}, optimizing

LsVB=EX,YEU1,,UKX[logQYU1,,UK(YU1,,UK)]+sk=1K(E[logQYUk(YUk)]DKL(pUkXkQUk))L_s^{\text{VB}} = \mathbb{E}_{X,Y}\mathbb{E}_{U_1,\ldots,U_K|X}\left[\log Q_{Y|U_1,\ldots,U_K}(Y|U_1,\ldots,U_K)\right] + s \sum_{k=1}^K \Big( \mathbb{E}[\log Q_{Y|U_k}(Y|U_k)] - D_{\text{KL}}(p_{U_k|X_k} \| Q_{U_k}) \Big)

(Aguerri et al., 2018, Murphy et al., 2022, Zaidi et al., 2020). Neural encoders and decoders are jointly trained to maximize this bound using stochastic gradient optimization.

  • Gaussian Water-Filling: For MIMO and fading channel instances, optimal achievable rates are computed via water-filling solutions over the eigenvalues of the channel matrix (Song et al., 2023, Xu et al., 2022).

4. Extensions: Collaborative, Streaming, and Oblivious Relaying Models

Dist-IB generalizes to several important multi-terminal settings:

  • Collaborative Distributed IB (CDIB): Encoders interact to cooperatively describe X1X_1, X2X_2 with auxiliary exchanges; relevance is measured at a third decoder (Vera et al., 2016). Inner and outer bounds quantify achievable (R1,R2,μ)(R_1, R_2, \mu) regions with elaborate Markov constructions.
  • Sequential/Streaming Dist-IB: Online algorithms process sequential data samples X1,X2,X_1, X_2, \ldots to construct T1,T2,T_1, T_2, \ldots under cumulative rate constraints, with the global optimum approximated via forward and backward multi-pass updates utilizing Gaussian conditional IB eigen-analysis (Farajiparvar et al., 2018).
  • Oblivious Relaying (CRAN): Distributed radio heads compress locally observed signals for central processing, without knowledge of codebooks. Relay mappings are designed to maximize bottleneck rates subject to compression capacity constraints, as in "Distributed Information Bottleneck for a Primitive Gaussian Diamond MIMO Channel" (Song et al., 2023, Xu et al., 2022).

5. Interpretability and Scientific Applications

Dist-IB also features prominently in interpretable deep learning and scientific explanation (Murphy et al., 2022):

  • Feature Subset Selection: By monitoring the KL term DKL(pθ(uixi)r(ui))D_{\mathrm{KL}}(p_\theta(u_i|x_i)\|r(u_i)) for each input component, Dist-IB automatically identifies and sequentially prunes less informative features, achieving principled subset selection without combinatorial search.
  • Explanatory Structure Discovery: Distributed bottleneck sweeps reveal the information architecture of complex systems, separating contributions of distinct input components and highlighting their respective relevance to predicting designated outputs. Applications include deconstruction of Boolean circuits and localization of plasticity in disordered materials.

6. Practical Implementations and Numerical Insights

Multiple practical schemes are reported for distributed relay and representation learning scenarios (Song et al., 2023, Xu et al., 2022):

  • Quantized Channel Inversion (QCI): Relays perform per-symbol zero-forcing and quantization of inverted noise levels, subsequently compressing quantized outputs under local rate constraints.
  • MMSE-Based Schemes: Linear MMSE estimation followed by additive Gaussian compression is provably near-optimal for finite bottleneck rates.
  • Empirical Performance: Numerical analyses confirm that simple relay mappings and symbol-by-symbol processing achieve rates closely tracking the theoretical Dist-IB upper bounds across a wide SNR and complexity spectrum, with QCI frequently saturating the upper bound and MMSE schemes approaching optimality at lower rates.
Model/Scenario Algorithmic Approach Empirical Behaviors
MIMO Diamond Channel QCI, MMSE, Water-Filling QCI saturates upper bound at ≈5+bits
Discrete Memoryless Sources BA Iteration, NN-VIB BA and VIB curves nearly coincide
Multi-view Deep Learning Neural Variational VIB KL trajectories select relevant inputs
Streaming Gaussian Data Greedy + Two-pass IB Global rate scales as O(logk)O(\log k)

7. Connections to Multiterminal Source Coding and Learning Theory

Dist-IB is fundamentally connected to multiterminal CEO coding under log-loss, Wyner–Ahlswede–Körner problems, common reconstruction constraints, and distributed multi-view representation learning architectures. The rate–relevance tradeoffs derived from Dist-IB inform practical systems for C-RAN, federated learning, multi-sensor inference, and scientific feature attribution (Zaidi et al., 2020, Aguerri et al., 2017). The framework encapsulates both the foundational Shannon-theoretic limits and principled neural algorithmic solutions for distributed data analysis.

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Distributed Information Bottleneck (Dist-IB).