Papers
Topics
Authors
Recent
Search
2000 character limit reached

Finite-Sample Redundancy Laws in Information Theory

Updated 26 September 2025
  • Finite-sample redundancy laws are quantitative relationships that characterize the excess inefficiency in algorithms due to limitations like finite precision, delay, and sample size.
  • They reveal how redundancy scales with specific parameters in contexts such as source coding, universal compression, and deep learning.
  • These laws guide practical design choices by balancing trade-offs in resource allocation, system robustness, and performance optimization across diverse applications.

Finite-sample redundancy laws refer to rigorous quantitative relationships that characterize the excess penalty—in terms of expected code length, risk, or representational inefficiency—incurred in various algorithms, models, and physical systems due to non-asymptotic, resource-constrained, or "imperfect" conditions such as finite precision, finite delay, finite sample size, finite blocklength, or structural limitations. These laws establish how redundancy scales with problem parameters, algorithmic choices, and implementation constraints, and are foundational for understanding optimality, robustness, and resource allocation in information theory, coding, compression, statistical inference, signal processing, and learning theory.

1. Precision–Redundancy Tradeoffs in Source Coding

Finite-precision representation of source probabilities directly leads to excess redundancy in classic source coding algorithms such as Shannon, Gilbert-Moore, Huffman, and arithmetic codes. For a source with alphabet size mm, probabilities pip_i are approximated by rationals fi/tf_i/t stored with WW bits, yielding a redundancy RR that satisfies the subadditive bound: Wηlog2mR,W \lesssim \eta \log_2 \frac{m}{R}, where η\eta is an implementation-dependent constant (12\frac{1}{2} for binary sources, m/(m+1)m/(m+1) for optimized mm-ary codes, $1$ for general progressive update designs) (0712.0057). The Kullback–Leibler divergence D(pp^)D(p \| \hat{p}) is bounded via the maximal approximation error δ\delta^* as D(pp^)mδ/PminD(p \| \hat{p}) \lesssim m\delta^*/P_{\min}, translating the effect of denominator tt (and hence WW) to the residual redundancy. The binary case admits Diophantine-optimal approximations with redundancy decaying as 1/t21/t^2 (leading to a halved WW), while mm-ary cases exhibit redundancy decay as 1/t1+1/m1/t^{1+1/m}, with practical code design implications for memory, hardware register width, and symbol grouping.

2. Delay–Redundancy Laws in Lossless Source Coding

Imposing a finite decoding delay dd on lossless source codes fundamentally affects the redundancy decay rate. In block/phrase-constrained coding (e.g., Huffman, Tunstall), redundancy decays polynomially with block/phrase length (O(1/d)); in contrast, delay-constrained sequential encoders (e.g., delay-limited arithmetic coding with bit flushing) achieve exponential decay: R(P,d)2dH2(P),R(P, d) \lesssim 2^{-dH_2(P)}, where H2(P)H_2(P) is the Rényi entropy of order 2 of the source (Shayevitz et al., 2010). The redundancy-delay exponent E(P)E(P), defined as E(P)=lim infd1dlogR(P,d)E(P) = \liminf_{d \to \infty} -\frac{1}{d}\log R(P,d), is lower-bounded by H2(P)H_2(P), but for almost all sources, it cannot exceed a bound depending on the minimal symbol probability and alphabet size. This exponential scaling marks a qualitative improvement over classical codes, and optimal code design under delay constraints is inextricably linked to the fine-grain properties of PP.

3. Redundancy Laws in Universal Data Compression on Countable Alphabets

For universal coding over a countably infinite alphabet, redundancy for a class P\mathcal{P} depends crucially on tail behavior. Finite single-letter redundancy (i.e., existence of qq with suppPD(pq)<\sup_{p \in \mathcal{P}} D(p \| q) < \infty) implies tightness, but not necessarily diminishing per-symbol redundancy with blocklength (Hosseini et al., 2014, Hosseini et al., 2018). The asymptotic per-symbol redundancy R(P)R(\mathcal{P}^\infty) equals the tail redundancy: T(P)=limminfqsuppPxmp(x)logp(x)q(x),T(\mathcal{P}) = \lim_{m \to \infty} \inf_q \sup_{p \in \mathcal{P}} \sum_{x \geq m} p(x) \log \frac{p(x)}{q(x)}, revealing that the cost of compressing novel, "tail" symbols dominates as nn grows: finite single-letter redundancy does not guarantee Rn(P)/n0R_n(\mathcal{P})/n \to 0, and only classes with vanishing tail redundancy are strongly compressible. This formalism captures the true essence of finite-sample redundancy in infinite-alphabet compression.

4. Minimax Redundancy and Regret in Parametric Models

In smooth parametric families (e.g., exponential families), finite-sample minimax redundancy and regret are determined by the Shtarkov and Jeffreys integrals (0903.5399, Beirami et al., 2011). For a dd-parameter family, the worst-case redundancy exhibits the canonical scaling: Rn=d2logn+logJ+o(1)R_n = \frac{d}{2}\log n + \log J + o(1) where logJ\log J is the Jeffreys correction term. Sufficient conditions for finite redundancy include restriction to compact parameter sets and tail decay of the base measure (density q(x)=O(1/x1+α)q(x) = O(1/x^{1+\alpha}) for some α>0\alpha > 0). For universal codes (including two-stage codes), the asymptotic average minimax redundancy serves as an accurate benchmark, while additional penalty terms for two-stage coding become negligible for large dd. In nonstandard settings (e.g., mixtures with heavy tails), the Jeffreys integral may diverge, limiting applicability of classic finite-sample redundancy laws.

5. Pseudocodeword Redundancy in Linear Codes

Pseudocodeword redundancy measures the minimum number of parity-check rows in a matrix HH so that all non-zero pseudocodewords have weight at least dd, the code’s minimum Hamming distance (Zumbragel et al., 2010, Zumbrägel et al., 2011). For iterative or LP decoding, this represents the finite-sample constraint needed to eliminate low-weight pseudocodewords and match ML decoding performance. Most random codes exhibit infinite pseudocodeword redundancy, but for codes based on designs (e.g., BIBDs) and cyclic codes meeting the Vontobel–Koetter eigenvalue bound,

wmin,AWGNCn2wcμ2wc2μ2,w_{\min,\text{AWGNC}} \geq n \frac{2w_c - \mu_2}{w_c^2 - \mu_2},

finite redundancy is attainable. This trade-off connects structural code properties and practical decoder design in finite regimes.

6. Redundancy Allocation Laws in Partitioned Codes

For finite-length nested (partitioned) codes in nonvolatile memory applications, redundancy must be allocated between defect masking (ll bits) and error correction (rr bits), under constraints l+r=nkl + r = n-k (Kim et al., 2018). Recovery failure probability is bounded as: P(m^m)2l(1+β)n+2r(1+α)n,P(\hat{\mathbf{m}} \neq \mathbf{m}) \leq 2^{-l}(1+\beta)^n + 2^{-r}(1+\alpha)^n, where β\beta is the defect probability, α\alpha is the erasure or crossover probability. The optimal allocation is estimated analytically (by KKT conditions) and matches simulation optima, underscoring the non-triviality of finite-sample performance compared to asymptotic theory.

7. Redundancy Laws in Structural Optimization and Function Approximation

Structural redundancy, formalized in robust optimization via information-gap theory, quantifies the maximal degradation (α\alpha) sustainable without exceeding performance thresholds, with worst-case performance hworst(x;α)h^\text{worst}(x; \alpha) (Kanno, 2016). Multiple damage scenarios yield non-differentiable optimization landscapes; algorithmic approaches such as derivative-free SQP leverage finite-difference gradients to navigate these constraints efficiently. In linear function approximation with numerically redundant bases (e.g., frames or overcomplete dictionary), numerical regularization (e.g., 2\ell^2 or TSVD) reduces required sample size, replacing the nominal dimensionality nn with an effective dimension nεn^\varepsilon such that

mCnεlognεm \geq C n^\varepsilon \log n^\varepsilon

for accurate recovery (Herremans et al., 13 Jan 2025).

8. Redundancy Laws in Function-Correcting Codes and Feature Learning

Function-correcting codes over finite fields require redundancy rf(k,t)2tr_f(k,t)\geq 2t (Ly et al., 19 Apr 2025). In large fields (qk+2tq\geq k+2t), optimal systematic MDS codes achieve r=2tr=2t, while in binary and moderate-sized fields,

2trf(k,t)<tlog(2k)1tloge,2t \leq r_f(k,t) < \frac{t\log(2k)}{1-t\log e},

demonstrating logarithmic overhead with dimension. These explicit finite-length laws guide practical code constructions.

In deep learning, finite-sample scaling laws are shown to be redundancy laws (Bi et al., 25 Sep 2025). Kernel regression under a covariance spectrum with λii1/β\lambda_i \propto i^{-1/\beta} yields excess risk decaying as nαn^{-\alpha} where

α=2s2s+1/β,\alpha = \frac{2s}{2s + 1/\beta},

with ss (source condition) and redundancy parameter 1/β1/\beta. Universality is established across invertible transforms, mixture domains, finite-width models, and Transformers, demonstrating that the scaling exponent is not universal but dictated by data redundancy.

Summary Table: Key Redundancy Laws and Scaling

Context Scaling Law / Bound Governing Parameters
Precision–Redundancy (Source coding) Wηlog2(m/R)W \lesssim \eta\log_2(m/R) η\eta: code-dependent constant, mm, RR
Delay–Redundancy (Sequential codes) R(P,d)2dH2(P)R(P,d)\lesssim 2^{-dH_2(P)} H2(P)H_2(P): Rényi entropy order 2, dd
Universal coding (infinite alphabet) R(P)=T(P)R(\mathcal{P}^\infty) = T(\mathcal{P}) Tail redundancy T(P)T(\mathcal{P})
Minimax Redundancy in Parametric Models Rn=d2logn+logJR_n = \frac{d}{2}\log n + \log J dd: param dim., JJ: Jeffreys integral
Partitioned/Nested Codes P(m^m)2l(1+β)n+2r(1+α)nP(\hat{m}\neq m)\leq 2^{-l}(1+\beta)^n + 2^{-r}(1+\alpha)^n ll, rr, α\alpha, β\beta
Function Approximation (Frames) mCnεlognεm\geq Cn^\varepsilon\log n^\varepsilon nεn^\varepsilon: eff. dim via regularization
Function-Correcting Codes 2tr<[tlog(2k)]/(1tloge)2t \leq r < [t\log(2k)]/(1-t\log e) kk: dim., tt: error level
Deep Learning Scaling (Redundancy Law) α=2s/(2s+1/β)\alpha = 2s/(2s+1/\beta) ss: smoothness, β\beta: spectral tail

Finite-sample redundancy laws reveal the precise mechanisms by which resource constraints and discrete, non-asymptotic phenomena induce excess risk, inefficiency, or code length, and provide critical guidance for algorithm and system design across multiple disciplines. These laws unify previously disparate observations on scaling, robustness, and regularization, making explicit the fundamental role of redundancy in practical applications.

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Finite-Sample Redundancy Laws.