Petz–Augustin Capacity Overview

Updated 17 January 2026

Petz–Augustin capacity is a generalization of Shannon capacity defined for classical, classical–quantum, and quantum channels using Rényi divergences.
It enables a minimax formulation that facilitates optimal error exponent analysis and sphere-packing bounds through unique Augustin centers.
Efficient computation is achieved via algorithms such as alternating optimization, fixed-point iterations, and Riemannian gradient descent with proven convergence rates.

The Petz–Augustin capacity, also denoted Augustin–Csiszár $\alpha$ -capacity in classical settings, is a generalization of Shannon (mutual) information capacity, defined for classical, classical–quantum (CQ), and fully quantum channels, parameterized by order $\alpha\in(0,1)\cup(1,\infty)$ . It extends the operational scope of channel coding by characterizing optimal error exponents and enabling a minimax formulation over input distributions and auxiliary output laws via Rényi divergences. This concept is central in modern error exponent analysis, sphere-packing bounds, and various duality results in both classical and quantum information theory.

1. Formal Definitions

Let $\mathcal{X}$ be a finite (or measurable) input alphabet and $\mathcal{Y}$ an output alphabet, $W:\mathcal{X}\to\mathcal{P}(\mathcal{Y})$ a channel, and $P_X$ an input law. For a CQ channel $W: x\mapsto W_x\in\mathcal{D}(\mathcal{H})$ , the Petz–Rényi divergence is

$D_\alpha(\rho\|\sigma) = \frac{1}{\alpha-1} \log \operatorname{Tr}[\rho^\alpha \sigma^{1-\alpha}], \qquad \rho,\sigma\in\mathcal{D}(\mathcal{H})$

for $\alpha\in(0,1)\cup(1,\infty)$ .

The order- $\alpha$ Augustin information is

$I_\alpha(P_X;W) := \inf_{Q\in\mathcal{P}(\mathcal{Y})} \mathbb{E}_{x\sim P_X} [ D_\alpha(W(\cdot|x) \| Q) ]$

for classical channels, or

$A_\alpha^{\rm Petz}(P, W) := \inf_{\sigma\in\mathcal{D}(\mathcal{H})} \sum_x P(x) D_\alpha(W_x \| \sigma)$

for CQ channels (Cheng et al., 2018).

The Petz–Augustin capacity is

$C_\alpha(W) := \sup_{P_X\in\mathcal{P}(\mathcal{X})} I_\alpha(P_X;W)$

or, equivalently,

$C_\alpha(W) = \sup_P \inf_Q D_\alpha(W\|Q|P)$

with $D_\alpha(W\|Q|P)$ as the conditional Rényi divergence under $P$ (Nakiboğlu, 2017, Cheng et al., 2021).

2. Augustin Mean, Center, and Existence

For each $P_X$ and fixed order $\alpha$ , there exists a unique Augustin mean (center) $Q_{\alpha,P}$ or $\sigma_{\alpha,P}$ solving

$I_\alpha(P_X;W) = D_\alpha(W\|Q_{\alpha,P}|P_X)$

and satisfying the fixed-point property

$T_{\alpha,P}[Q_{\alpha,P}] = Q_{\alpha,P};$

the Augustin operator and its functional-analytic properties guarantee existence and uniqueness under mild dominance and integrability conditions (Cheng et al., 2021, Nakiboglu, 2018). Iterative algorithms based on contraction in Thompson metric yield convergence to $Q_{\alpha,P}$ in total variation or norm.

3. Variational and Minimax Representations

The Augustin information admits several variational forms:

Minimax equivalence: $\sup_{P}\inf_{Q} D_\alpha(W\|Q|P) = \inf_{Q}\sup_{P} D_\alpha(W\|Q|P)$ with the minimizer in $Q$ being the Augustin center (Nakiboğlu, 2017, Nakiboglu, 2018).
KL–Shannon-type characterization: $I^C_\alpha(P_X,W) = \begin{cases} \min_{\widetilde{Y|X}} \{ I(P_X,\widetilde{Y|X}) + \frac{\alpha}{1-\alpha} D(P_X \widetilde{Y|X} \| P_X W) \} & \alpha<1 \[1ex] \max_{\widetilde{Y|X}} \{ I(P_X,\widetilde{Y|X}) + \frac{\alpha}{1-\alpha} D(P_X \widetilde{Y|X} \| P_X W) \} & \alpha>1 \end{cases}$ with ordinary Shannon mutual information $I(P_X, \widetilde{Y|X})$ (Kamatsuka et al., 2024).
“Sibson–style” form: $I^C_\alpha(P_X,W) = H(P_X) + \frac{\alpha}{\alpha-1} \max_{R_{X|Y}} \sum_x P_X(x) \log \sum_y W(y|x) R(x|y)^{1-1/\alpha}$ for all $\alpha\ne 1$ .

4. Algorithmic Computation

4.1 Classical Channels

Alternating Optimization (AO): Iterative min–max schemes for $\alpha\in(0,1)$ and $\alpha>1$ alternate between optimizing the output mean and the reverse channel; global convergence is established for both regimes (Kamatsuka et al., 2024).
Hybrid Geodesically Convex RGD: Riemannian gradient descent using both Euclidean and Poincaré metrics achieves $O(1/T)$ non-asymptotic optimization error for all positive orders, outperforming fixed-point methods for $\alpha>1$ (Wang et al., 2024).

4.2 Classical–Quantum Channels

Fixed-point Iteration (Thompson metric): For $\alpha\in(1/2,1)\cup(1,\infty)$ , a contraction mapping on the positive-definite cone converges at rate $O(|1-1/\alpha|^T)$ , yielding the optimal output mean and capacity (Chu et al., 10 Feb 2025, Chu et al., 10 Jan 2026).
Entropic Mirror Descent Outer Loop: Maximizing over input distributions via a mirror-descent algorithm that is smooth relative to negative Shannon entropy, converging in $O(1/T)$ (Chu et al., 10 Jan 2026).
Nesterov Universal Fast Gradient: For Petz–Rényi information, Hölder-smoothness enables rates $O(\varepsilon^{-2\alpha/(3\alpha-1)})$ (Chu et al., 10 Jan 2026).
Complexity for these schemes is polynomial in size and model dimension, and logarithmic or polynomial in the tolerance.

4.3 Unconstrained and Cost-constrained Cases

For convex constraint sets, all forms above can be generalized. Legendre–Gallager duality and minimax theorems ensure unique centers and well-posedness even under resource constraints (Nakiboğlu, 2017, Nakiboglu, 2018).

5. Analytical Properties

Continuity: Joint continuity of $A^{\rm Petz}_\alpha(P,W)$ in $(\alpha, P)$ and uniform equicontinuity in $P$ over compact $\alpha$ intervals (Cheng et al., 2018).
Concavity: Scaled auxiliary functions $s\rightarrow E_0^a(s,P)=sA^{\rm Petz}_{1/(1+s)}(P,W)$ are concave on $s\in(-1,0)$ (strong converse regime) and $s\geq0$ (reliability regime). This establishes robust minimax identities and constant-composition achievability for error exponents (Cheng et al., 2018).
Monotonicity and Differentiability: $I_\alpha(P;W)$ is nondecreasing and analytic in $\alpha$ , with explicit expressions for its derivatives (Nakiboglu, 2018).

6. Operational Implications

Error Exponents, Sphere-Packing: For both classical and CQ channels, $C_\alpha(W)$ gives the best possible random-coding exponent and sphere-packing bound. In product channels, the exact exponent matches

$E_{sp}(R, W, \Gamma) = \sup_{0<\alpha<1} \frac{1-\alpha}{\alpha} [ C_\alpha(W;\Gamma) - R ]$

with polynomial prefactors derived from concentrated large deviations (Nakiboğlu, 2017, Cheng et al., 2018).

Strong Converse: The optimal strong converse exponent is characterized by concave auxiliary functions, constant-composition codes achieve the optimal rate by minimax equality (Cheng et al., 2018).
Entropic Dualities: Fenchel duality connects data compression with quantum side-information to channel coding; the Petz–Augustin capacity unifies source and channel coding exponents in CQ scenarios (Cheng et al., 2018).

7. Numerical Illustrations and Algorithms

Implementations for small finite channels show AO and RGD methods converge in tens to hundreds of iterations, with robust geometric or polynomial decay of objective error (Kamatsuka et al., 2024, Wang et al., 2024).
In instance comparisons (e.g., two-input, three-output channel), orderings $I^{LP}_\alpha \leq I^S_\alpha \leq I^C_\alpha$ for $\alpha<1$ and $I^C_\alpha \leq I^{LP}_\alpha \leq I^S_\alpha$ for $\alpha>1$ are confirmed (Kamatsuka et al., 2024).

Algorithm	Applicable Regime	Convergence Rate
AO (Classical)	$\alpha\in(0,1)\cup(1,\infty)$	Geometric-like
Fixed-point (CQ)	$\alpha\in(1/2,1)\cup(1,\infty)$	$O(\|1-1/\alpha\|^T)$
RGD	$\alpha>0$	$O(1/T)$
Entropic Mirror	Any	$O(1/T)$

Principal mathematical developments and algorithmic breakthroughs are published in (Kamatsuka et al., 2024) (classical AO, numerical evaluation, and variational forms), (Chu et al., 10 Feb 2025) (CQ fixed-point, linear convergence), (Wang et al., 2024) (hybrid RGD), (Chu et al., 10 Jan 2026) (non-asymptotic CQ algorithms and entropic mirror descent), (Cheng et al., 2018) (analytic properties and strong converse), (Cheng et al., 2021, Nakiboglu, 2018), and (Nakiboğlu, 2017) (sphere-packing bounds, duality, and center existence).

The Petz–Augustin capacity is thus fundamental for nonasymptotic channel coding, error analysis, and dualities in quantum information theory. Its computation, analytic properties, and operational roles continue to be active fields of optimization, coding theory, and quantum communication research.