Papers
Topics
Authors
Recent
Search
2000 character limit reached

Probability Matching Interval Coding (PMATIC)

Updated 18 January 2026
  • PMATIC is a coding strategy that represents messages as refined intervals in [0,1] to achieve reliable feedback communication and robust lossless compression.
  • It uses randomized posterior matching and quantized probability synchronization to align encoder and decoder decisions under bounded predictor mismatch.
  • The scheme offers theoretical guarantees such as channel capacity achievement, exact decoding with controlled error rates, and efficient constant-time updates per symbol.

Probability Matching Interval Coding (PMATIC) is a family of schemes for reliable communication and lossless data compression that operate by aligning or quantizing interval probabilities in the encoding and decoding process. PMATIC spans two main research lines: (1) randomized feedback schemes that achieve channel capacity for memoryless channels via sequential interval refinement, and (2) robust, model-agnostic coding for lossless compression under bounded predictor mismatch, especially in the context of neural network-driven codecs. Both classes leverage probability synchronization and interval-based representation to ensure exact decoding with strong theoretical guarantees while accommodating practical implementation constraints (Shayevitz et al., 2015, Adler et al., 15 Jan 2026, Mesa et al., 2019).

1. Mathematical Foundations and Core Principles

PMATIC schemes center on expressing message information through a random interval in the unit interval [0,1][0,1], which is iteratively refined based on channel feedback or model predictions. In canonical feedback communication, the encoder views the message as a point Θ0Uniform[0,1]\Theta_0 \sim \text{Uniform}[0,1] and, at each time step, updates a posterior interval based on channel output or, analogously, the predicted probability distribution in a compression scenario.

For channel coding, the encoder and decoder share common randomness (e.g., a sequence VnUniform[0,1]V_n \sim \text{Uniform}[0,1]). The encoder transmits Xn=FX1(Θn)X_n = F_X^{-1}(\Theta_n), with posterior update Θn+1=(FΘY(ΘnYn)+Vn)mod1\Theta_{n+1} = (F_{\Theta|Y}(\Theta_n | Y_n) + V_n) \bmod 1, where FXF_X is the CDF of the chosen input distribution PXP_X and FΘYF_{\Theta|Y} the posterior-matching kernel induced by PXYP_{XY} (Shayevitz et al., 2015). The decoder applies the reversed iterated function system (RIFS) to reconstruct the shrinking interval JnJ_n such that Pr(Θ0Jn)=pe\Pr(\Theta_0 \notin J_n) = p_e for each nn. The instantaneous decoded rate is Rn=(1/n)log2JnR_n = -(1/n)\log_2|J_n|.

For model-driven lossless compression, PMATIC quantizes the predicted per-bit probabilities {pi(j)}\{p_i(j)\} to robust centers to synchronize encoder and decoder even under bounded model mismatch. The approach ensures that both parties select identical quantized probabilities for each prefix despite discrepancies in the underlying probability vectors, with a helper bit per generated code bit to resolve near-boundary ambiguity (Adler et al., 15 Jan 2026).

2. Encoder and Decoder Algorithms

Randomized Posterior Matching (Feedback Channel)

  • Encoder: Initializes with Θ1=Θ0\Theta_1 = \Theta_0; at iteration nn, computes Xn=FX1(Θn)X_n = F_X^{-1}(\Theta_n); receives YnY_n via noiseless feedback; updates state to Θn+1=(FΘY(ΘnYn)+Vn)mod1\Theta_{n+1} = (F_{\Theta|Y}(\Theta_n | Y_n) + V_n) \bmod 1.
  • Decoder (RIFS): Sets initial interval J0J_0 of length 1pe1-p_e; iteratively applies Jk+1=FΘY1((JkVnk)mod1Ynk)J_{k+1} = F_{\Theta|Y}^{-1}((J_k - V_{n-k}) \bmod 1 | Y_{n-k}) for k=0,,n1k=0,\dots,n-1.

These operations require evaluation of the CDF and its inverse for both marginals and posteriors at each step; each update has constant computational complexity assuming fast inversion routines (Shayevitz et al., 2015).

Model-Driven Lossless Compression (Bounded Predictor Mismatch)

  • Encoder: For each token xix_i (mapped to bits bib_i), computes model probabilities pi(j)p_i(j) for j=1,,j=1,\dots,\ell, where \ell is token bit width. Each pi(j)p_i(j) is quantized: if pi(j)p_i(j) lies safely within a bin, encode helper bit b=0b'=0 and use the bin center, else b=1b'=1 and use nearest boundary point. Both helper and data bits are arithmetic encoded using the quantized probability (Adler et al., 15 Jan 2026).
  • Decoder: For each position, computes prediction qi(j)q_i(j), uses received helper bit bb' to select quantization bin/boundary identical to encoder’s choice, then decodes corresponding bit.

This guarantees exact token reconstruction when logitsEnclogitsDecϵ\|\text{logits}_{\text{Enc}} - \text{logits}_{\text{Dec}}\|_\infty \leq \epsilon, with helper-bit and quantization overhead controlled by parameter r>2δr > 2\delta, δ=ϵ/2\delta = \epsilon/2 (Adler et al., 15 Jan 2026).

3. Theoretical Properties and Performance Guarantees

Channel Feedback Coding

  • Capacity Achievement: For any memoryless channel satisfying mild regularity (absolute continuity, finite moments), and for any target error pe>0p_e>0, PMATIC achieves

limnPr[Rn>I(X;Y)δ]=1\lim_{n \to \infty} \Pr[R_n > I(X;Y) - \delta] = 1

for any δ>0\delta>0, where I(X;Y)I(X;Y) is the mutual information for the chosen PXP_X. Optimizing PXP_X over the capacity-achieving input gives RCR \to C (Shayevitz et al., 2015).

  • Error Control: The error probability Pr(Θ0Jn)\Pr(\Theta_0 \notin J_n) is exactly pep_e for all nn.
  • Random Walk Interpretation: The shrinkage of JnJ_n is governed by a Markov random walk {Sn}\{S_n\} with increments Lk=log(Jk1/Jk)L_k = \log(|J_{k-1}|/|J_k|), converging to mean I(X;Y)I(X;Y) in the limit (Shayevitz et al., 2015).

Compression under Prediction Mismatch

  • Decoding Correctness: For dcond-TV(p(i),q(i))δd_{\text{cond-TV}}(p(i),q(i)) \leq \delta at all ii, encoder and decoder always agree on quantized per-bit probabilities, guaranteeing exact reconstruction (Adler et al., 15 Jan 2026).
  • Redundancy and Overhead: Overhead per encoded bit is O(δlog(1/δ))O(\sqrt{\delta \log(1/\delta)}), balancing helper-bit entropy and Bernoulli-KL divergence due to quantization.
  • Empirical Performance: For example, with δ=105,r=0.005\delta=10^{-5}, r=0.005, PMATIC achieves $3.52$ bits/token on text, decoding accurately under logit noise, outperforming standard compressors such as gzip (Adler et al., 15 Jan 2026).

4. Higher-dimensional and Optimal Transport Extensions

PMATIC generalizes to higher-dimensional message spaces using optimal transport theory. For parameter estimation/message transmission in Rd\mathbb{R}^d, at each step nn:

  • Construct the optimal transport map Tn1:ΩΩT_{n-1}: \Omega \to \Omega that pushes the current posterior density pn1p_{n-1} to uniform, then select Un=Tn1(W)U_n = T_{n-1}(W) with WW the message point.
  • Transmit Xn=Φ(Un)X_n = \Phi(U_n), Φ\Phi the OT map to the optimal input distribution on Xd\mathcal{X}^d.
  • The decoder refines an estimate Jn=Tn1([ε/2,1ε/2]d)J_n = T_n^{-1}([{\varepsilon}/2,1-{\varepsilon}/2]^d), guaranteeing P(WJnY1:n)1εP(W \in J_n | Y^{1:n}) \geq 1-\varepsilon and Vol(Jn)0\text{Vol}(J_n)\to 0 (Mesa et al., 2019).

A key result is that reliability and positive rate transmission are equivalent to Birkhoff-ergodicity of the induced Markov process (Un)(U_n), resulting in an "all-or-nothing" property: either no rate is possible or all R<CR<C are achievable (Mesa et al., 2019).

5. Practical Implementation, Complexity, and Limitations

Feedback Coding

  • Complexity: Each symbol step involves one evaluation and inversion for FXF_X and FΘY(y)F_{\Theta|Y}(\cdot|y), O(1)O(1) per symbol (Shayevitz et al., 2015).
  • Horizon-Free Operation: The receiver may halt decoding at any time nn, extracting an interval of width 2nRn2^{-nR_n} containing the message with prescribed error.

Lossless Compression

  • Deployment Compatibility: PMATIC acts as a drop-in replacement for arithmetic coding in model-driven compressors; no changes to tokenization, dictionary, or predictor are needed (Adler et al., 15 Jan 2026).
  • Assumptions: The bounded-mismatch model presumes strict \ell_\infty bounds on logit differences between encoder and decoder. Extensions to systems with stochastic or unbounded drift are not established.
  • Parameter Selection: Recommended quantization parameters use rδlog(1/δ)r \asymp \sqrt{\delta \log(1/\delta)}, with most overhead due to helper bits at small δ\delta.

Practical Considerations

  • For high-dimensional extension, solving OT maps at each update is computationally nontrivial except in low dimensions or special structures (Mesa et al., 2019).
  • For variable-length token codes in compression, additional bookkeeping is needed to ensure bit alignment in PMATIC without changing the fundamental algorithm (Adler et al., 15 Jan 2026).

6. Summary Table of Key PMATIC Properties

Research Context Core Property Theoretical Guarantee
Feedback Coding (Shayevitz et al., 2015) Sequential, horizon-free Achieves RCR \to C, error pep_e exact
Model-driven Compression (Adler et al., 15 Jan 2026) Bounded-mismatch robust Overhead O(δlog(1/δ))O(\sqrt{\delta \log(1/\delta)})
Multidimensional (Mesa et al., 2019) OT-based generalization All-or-nothing rates via ergodicity

PMATIC builds on the posterior matching concept introduced by Shayevitz & Feder, extending with crucial randomization steps to avoid fixed-point pathologies and guarantee capacity. The addition of quantized probability synchronization in compression tasks addresses the newly prominent challenge of non-determinism from large, learned prediction models. The theory benefits from strong connections to Markov processes, martingale convergence, ergodic theory (for high-dimensional reliability), and optimal transport.

Extensions to non-memoryless or feedback-degraded channels, as well as further robustification against unmodeled sources of mismatch or drift in predictive models, remain active areas for future research. Practical acceleration of multidimensional OT map computation is also essential for scalable application of PMATIC beyond the univariate or low-dimensional setting.

References: (Shayevitz et al., 2015, Adler et al., 15 Jan 2026, Mesa et al., 2019)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Probability Matching Interval Coding (PMATIC).