Rate-Distortion with Perfect Perception

Updated 30 January 2026

The paper establishes that perfect perception—ensuring the reconstruction’s marginal matches the source—incurs a higher minimum bitrate than classical rate-distortion coding.
Analytical solutions across discrete, Bernoulli, and Gaussian models illustrate how optimization techniques like BA-type algorithms and adaptive water-filling are adapted under the perception constraint.
Practical implications include advances in neural compression and multi-terminal networks, where enforced statistical matching reshapes encoder-decoder design and bitrate allocation.

Rate-distortion with perfect perception refers to the fundamental limit of lossy source coding when enforcing exact distributional matching between the original source and its reconstruction. This regime augments classical rate-distortion theory by introducing a hard perception constraint—typically zero divergence in a suitable statistical metric (e.g., Kullback-Leibler, Wasserstein, $f$ -divergence)—forcing the output to be statistically indistinguishable from the input. The resulting tradeoff involves an elevated minimal bitrate (mutual information), changed optimal encoder/decoder structure, and distinct code design principles. This regime is pertinent across discrete, continuous, multivariate, and process sources, including neural compression and multi-terminal networks.

1. Formal Definition and Information-theoretic Optimality

Let $X\sim P_X$ be a source, $\hat X$ its reconstruction, and $d(X, \hat X)$ a distortion measure. The rate-distortion-perception function is

$R(D,P) = \min_{P_{\hat X|X}: \mathbb{E}[d(X,\hat X)] \le D,\, \delta(P_X,P_{\hat X}) \le P} I(X;\hat X)$

where $\delta$ is any statistical divergence quantifying perceptual discrepancy. The perfect perception regime requires $P=0$ , i.e., $P_{\hat X} = P_X$ exactly. Therefore,

$R(D,0) = \min_{P_{\hat X|X}: \mathbb{E}[d(X,\hat X)] \le D,\, P_{\hat X} = P_X} I(X;\hat X)$

This is a convex optimization over test channels whose output marginal matches the source law. The mutual information penalty compared to unconstrained $R(D)$ manifests the cost of guaranteeing realism at every rate-distortion pair (Theis et al., 2021, Chen et al., 2022, Lei et al., 21 Mar 2025).

2. Analytical Solutions and Source Models

Discrete (Memoryless) and $f$ -divergence Constraints

For finite alphabets and arbitrary $f$ -divergences,

$R(D,0) = \min_{P_{\hat X|X}: \mathbb{E}[d(X,\hat X)] \le D,\, P_{\hat X} = P_X} I(X;\hat X)$

Solving via Lagrange multipliers and KKT conditions yields Boltzmann-type test channels,

$P^*_{\hat X|X}(\hat x|x) = \frac{P_X(\hat x) e^{-s\,d(x,\hat x)}}{Z(x)},\, Z(x) = \sum_{\hat x} P_X(\hat x) e^{-s\,d(x,\hat x)}$

alternating root-finding in $s$ ensures the distortion is tight. The closed-form mutual information is

$R(D,0) = \max_{s \ge 0} \left\{ -s D - \mathbb E_X [\log Z(X)] \right\}$

Convergence is guaranteed by strict convexity; efficient BA-type algorithms are available (Serra et al., 2023, Serra et al., 2024, Chen et al., 19 Aug 2025).

Vector Bernoulli Sources

For $\mathbf{X}=(X_1,\ldots,X_n)$ , independent Bernoulli $(q_i)$ , Hamming distortion, and single-letter perception: $R(D,0) = \sum_{i=1}^n \left[2 h_2(q_i) - h_3\left(\frac{d_i^*}{2}, q_i\right) - h_3\left(\frac{d_i^*}{2}, 1-q_i\right)\right]$ where each $d_i^*$ is computed from a transcendental equation fixed by the global distortion $D$ , and the output marginals are exact (Vippathalla et al., 21 Jan 2025).

Gaussian Vector and Process Sources

Let $X \sim \mathcal{N}(0, \Sigma)$ ; for squared-error distortion and KL/Wasserstein-perception, the optimal reconstruction $\hat X$ must be jointly Gaussian with $X$ , sharing marginal variances. The RDPF decomposes: $R(D,0) = \sum_{i=1}^N \frac{1}{2} \ln \left( \frac{(1 - D_i/(2\lambda_i))^2}{D_i/\lambda_i - (D_i/(2\lambda_i))^2} \right)$ with $D_i$ allocated so $\sum_i D_i = D$ and $0 \le D_i \le 2\lambda_i$ . This strictly enforces $P_{\hat X} = P_X$ (Serra et al., 2023, Qian et al., 2024).

For zero-mean GPs with covariance operator $K_X$ on $(\Omega,\mu)$ , coefficients aligned via the Karhunen-Loève basis,

$R(D,0) = \sum_{i=1}^\infty \frac{1}{2} \ln\left(\frac{\lambda_i}{\gamma_i^*}\right)$

with individual distortion allocations constrained by $\sum_i D_i = D$ , and output GP variances matching inputs (Serra et al., 10 Jan 2025).

Stationary GP Case

In the limit $T \to \infty$ , for spectral density $S_X(f)$ ,

$R(D,0) = \frac{1}{2}\int_{S_X(f)>\gamma} \ln\frac{S_X(f)}{\gamma} df$

with distortion determined via

$D = \int \min\{S_X(f), \gamma\} df$

This matches the classical water-filling RDF when the perception constraint is slack (Serra et al., 10 Jan 2025).

3. Operational Coding Theorems and Achievability

Optimal codes are often stochastic and may require shared randomness (dither, circular shift, etc.), especially to enforce the output marginal constraint in high dimensions. Achievability is established via constructions that ensure exact output distribution—e.g., dithered lattice quantization (SD-LTC), codebook symmetrization with circular shift, or block soft-covering, and, for point-to-point codes, precise BA-type output-constrained algorithms (Lei et al., 21 Mar 2025, Zhou et al., 2024, Yang et al., 16 Jan 2026, Wagner, 2022). In process and multi-terminal environments, blockwise marginal matching may leverage randomization over source permutations.

4. Rate Penalty and Distortion Effects

For additive MSE distortion, enforcing perfect perception incurs an exact factor-of-two penalty in the minimal achievable distortion: $D_{\mathrm{PP}}^*(R) = 2 D^*(R)$ i.e., $R(D, 0) = R(D/2, \infty)$ . The optimal encoder remains unchanged; the decoder must randomize conditional on codewords so reconstructions are sampled from the empirical posterior (Yan et al., 2021). For finite discrete sources, a nonzero rate is required even at large distortion until all marginals are matched. The threshold for zero rate increases from $\sum_i q_i$ (classical) to $\sum_i 2q_i(1-q_i)$ with perfect perception (Vippathalla et al., 21 Jan 2025).

In Gaussian vectors, perfect perception eliminates "inactive" coordinates—no subchannel is shut off, and adaptive water-levels replace uniform water-filling (Qian et al., 2024).

5. Algorithms and Computation

Perfect-perception RDPF for discrete sources yields tractable convex programs with strict feasibility and fast convergence. The primal-dual Blahut-Arimoto and alternating minimization (OAM, NAM, RAM) schemes admit explicit dual parameterizations. For continuous sources and processes, copula-based I-projections and gradient methods map the constrained mutual information minimization into convex parameter space (Serra et al., 2024). For GPs, the problem decouples into independent coefficient channels, yielding efficient block-coordinate solvers (Serra et al., 2023, Serra et al., 10 Jan 2025).

Summary table: Computational approaches for $R(D,0)$

Source	Optimization	Solution Structure	Algorithmic Scheme
Discrete	Convex	Boltzmann test-channel	BA / primal-dual (Chen et al., 19 Aug 2025)
Bernoulli vect.	Single-letter	Transcendental eqs., per-component splitting	Closed form (Vippathalla et al., 21 Jan 2025)
Gaussian vect.	Convex	Adaptive water-filling, no inactive	Gauss-Seidel alternating (Qian et al., 2024, Serra et al., 2023)
GP	KL/W2	Karhunen-Loève diagonalization	Blockwise, analytical (Serra et al., 10 Jan 2025)

6. Multi-terminal and Network Generalizations

Perfect perception has been generalized to multi-terminal problems, such as the Gray-Wyner setting. With two correlated sources, the achievable region involves mutual information of common information $I(X_1, X_2; U)$ and two conditional rate-distortion-perception functions: $R_0 \ge I(X_1, X_2; U),\, R_1 \ge R_{X_1|U},\, R_2 \ge R_{X_2|U}$ where $R_{X_i|U}$ is the minimal rate for branch $i$ ensuring both distortion and perception constraints, and the union over all $Q_{X_1X_2U}$ yields the full RDP region (Yang et al., 16 Jan 2026). Code constructions directly incorporate random circular shift to enforce output distribution matching.

7. Practical Compression and Neural Systems

Neural compressors achieving the RDPF-optimal tradeoff incur penalty in rate or distortion when distributional matching is enforced. Dithered lattice quantization with infinite shared randomness enables exact distribution matching in the limit; finite randomness yields tractable staggered quantizer schemes nearly as efficient (Lei et al., 21 Mar 2025, Zhou et al., 2024). Training frameworks leverage two-stage pipelines: rate-distortion encoding followed by adversarially-trained decoders, where perfect perception requires solely matching conditional distributions, dispensing with additional distortion penalization (Yan et al., 2021). In learned compression and JPEG-like codecs, practical analysis of the RDPF surface guides bitrate allocation under strict perceptual constraints (Kirmemis et al., 2021).

In all settings, rate-distortion with perfect perception encapsulates the tradeoff wherein enforcing output realism fundamentally increases bit requirements and alters optimal coding strategies, with concrete operational and algorithmic consequences across information theory, neural compression, and generative modeling.