Adaptive Transform Coding (ATC)

Updated 13 January 2026

Adaptive Transform Coding (ATC) is a paradigm that adapts the transform basis to local data characteristics for improved rate–distortion performance in lossy compression.
It leverages codebook construction, optimization on the Stiefel manifold, and low-dimensional model parameterization to overcome the limitations of fixed-basis transforms like DCT.
Modern ATC integrates into video codecs and deep learning models, achieving measurable gains such as up to 41% bandwidth savings and 10–15 dB PSNR improvements over conventional methods.

Adaptive Transform Coding (ATC) is a paradigm in lossy image, video, and semantic communication systems wherein the transform stage—responsible for decorrelating input signals and compaction of signal energy—is dynamically selected or adapted to local data characteristics or environment. ATC extends classical fixed-basis transform coding by introducing adaptation at the level of basis function selection, parameterization, or even nonlinear model reconfiguration, with the principal aim of achieving improved rate–distortion efficiency, robustness, and generalization across diverse input statistics and operational scenarios.

1. Theoretical Foundations and Classical ATC Approaches

Classical transform coding typically employs fixed, separable orthonormal transforms such as the Discrete Cosine Transform (DCT) or Asymmetric Discrete Sine Transform (ADST) on pre-defined block sizes. However, such static bases are fundamentally mismatched to the non-stationary and spatially heterogeneous nature of real-world data, resulting in suboptimal coding gain.

ATC generalizes this by constructing a codebook $\{T_i\}$ of orthonormal transforms and assigning each signal block or macroblock to the transform that minimizes mean-squared reconstruction error after quantization. The formal objective, as articulated in (Boragolla et al., 2022), is

$\min_{\{T_i\}, \{\Omega_i\}} \sum_{i=1}^N \int_{\Omega_i} \Theta(T_i, C) p(C) dC,\quad \mathrm{s.t.}\; T_i^T T_i = I,$

where $C$ is the local block covariance, and $\Theta(T, C)$ is the coding distortion for a given transform and covariance.

Optimal codebook construction inherently involves addressing the Stiefel manifold structure (orthonormality constraints), typically via block coordinate descent: alternating assignment of block covariances to codewords (partition) and centroids (transform update via manifold gradient descent and retraction). This process ensures that frequently occurring covariance patterns drive the design, yielding measurable PSNR and bit-rate gains over fixed DCTs even for small codebooks (e.g., 0.3–0.6 dB, rate savings up to 6%) (Boragolla et al., 2022).

Forward ATC further constrains the codebook construction problem by modeling block-wise covariances using spatial priors, such as low-dimensional Gauss-Markov random field (GMRF) models. By estimating and quantizing the natural parameters of the GMRF generative model (e.g., a 2D AR process with Neumann boundary conditions), the otherwise intractable orthonormal matrix quantization is reduced to vector quantization in low-dimensional parameter space. The resulting codebook is scalable in block size and allows variable block-size ATC under quadtree partitioning, demonstrating robust coding-gain improvements and signaling efficiency (Boragolla et al., 2024).

2. ATC in Modern Video Codecs: Standardization and Algorithmic Developments

AV2, the successor to AV1, exemplifies the deployment of ATC at scale in standardized video coding (Nalci et al., 6 Jan 2026). In AV2, ATC is integrated at the entropy coding stage for transform coefficients. Each transform block (TB) is partitioned into “Low-Frequency” and “Default” spatial regions. The most perceptually significant Low-Frequency (LF) coefficients are coded using larger alphabets and rich context models, while the rest use more compact representations.

Unified up-right diagonal scan order, systematic position-aware context derivation (using neighborhood sums of already-decoded coefficients), and region-dependent coding passes (Base Range, Low Range, High Range with escape symbols) yield:

Consistent bitrate savings (about 0.6% BD-rate in All-Intra for natural content)
Reduced memory and logic complexity (single scan order, smaller total number of coding contexts)
Marginal decoder overhead (~0.5 KB CDF storage), with negligible increase in computational cost

Although no per-block dynamic transform selection is performed, coefficient coding adapts dynamically to block content, and ATC seamlessly integrates with advanced tools such as Trellis Coded Quantization, Probability Adaptation Rate Adjustment, and multi-symbol arithmetic coding (Nalci et al., 6 Jan 2026).

3. Data-Driven and Model-Based ATC: GMRF Parameterization and Beyond

Recent advances have extended ATC from codebook selection among linear orthonormal transforms to the quantization of local Karhunen-Loève Transform (KLT) matrices via low-dimensional model parameterization. The approach in (Boragolla et al., 2024) models each image block as a realization from a finite-lattice, non-causal homogeneous GMRF with asymmetric Neumann boundary. The KLT is mapped to a unique GMRF parameter vector $\theta$ , and a codebook of such vectors is constructed via high-rate coding-gain–centroid optimization.

This methodology provides several key properties:

Universality of codebook over block sizes, due to dimension $p$ of $\theta$ being block-size invariant
Coding-gain–based estimator for $\theta$ outperforms maximum-likelihood fitting in transform coding tasks
Efficient vector quantization in $\mathbb{R}^p$ replaces costly quantization on the Stiefel manifold
Empirical coding gains: 0.3–0.6 dB PSNR over DCT for fixed block ATC, with up to 1.5 dB over state-of-the-art codebook approaches in variable block-size modes

This model-based ATC paradigm enables principled and tractable transform adaptation, especially in variable block-size frameworks.

4. Deep ATC and Rate–Distortion–Optimized Transform Learning

End-to-end learned ATC systems unify nonlinear transforms (analysis/synthesis), quantization, and adaptive entropy modeling under a Lagrangian rate–distortion objective. In (Dai et al., 2022), the entire transmission pipeline—including the transform, quantizer, channel encoder/decoder, and entropy prior—is realized as a parameterized deep neural network (DNN):

The encoder applies a nonlinear analysis transform $g_a$ , a hyperprior branch for entropy modeling, and a channel-adaptive modulation network that injects instantaneous channel state (SNR, CQI) into the system via gating.
At the receiver, the corresponding synthesis transform $g_s$ reconstructs the source.
The rate–distortion objective is

$\mathcal{L}_{R-D}(\phi, \theta, \psi) = \mathbb{E}_{x,h} \left[ \lambda \cdot \{ -\eta_y \log p(y|z) - \eta_z \log p(z) \} + d(x, \hat{x}) \right]$

Online adaptation mechanisms minimize the amortization gap of VAEs by “overfitting to the extreme” at test time: via either per-instance (transmitter-only) gradient-based adaptation of encoders/codes, or full transceiver adaptation with domain-specific decoder model updates.

Quantitative evaluation demonstrates up to 41% channel bandwidth savings and 10–15 dB PSNR gains over the VVC plus 5G LDPC baseline for AWGN channels (Dai et al., 2022).

5. Learned Adaptive Transform Coding Architectures and Channel-Wise Adaptation

Contemporary learned image and video compression pipelines deploy ATC using both linear and nonlinear transforms. In (Duong et al., 2022), the authors introduce multi-rate ATC with a learned pipeline that generalizes the DCT via element-wise gain matrices and learned quantization as a function of the rate–distortion tradeoff parameter $\lambda$ . A forward-adaptive hyperprior entropy model enables full coverage of the R–D curve via a single parameterized model, with up to 40% BD-rate reduction over fixed DCT and minimal additional computational cost.

LLIC (Jiang et al., 2023) exemplifies spatial and channel-wise adaptive transform coding within a deep CNN-based architecture. Large kernel depth-wise convolutions expand the effective receptive field (ERF), amplifying long-range redundancy removal. Self-conditioned generation of per-channel kernel weights tailors transform sensitivity to local image texture. Between spatial transform blocks, adaptive channel-wise bit allocation is realized via self-conditioned channel scalars, allowing dynamic bit redistribution based on the importance of each channel’s features.

Ablation studies confirm:

Full self-conditioned spatial and channel transforms yield 6% BD-rate improvement over static counterparts
Combined with large-patch curriculum training and increased kernel support, total BD-rate improvements reach 7.85%
LLIC achieves state-of-the-art rate–distortion performance with modest complexity (Jiang et al., 2023)

Transformer-based methods—such as AICT (Ghorbel et al., 2023)—replace all linear transforms and entropy models with Swin Transformer modules and channel-wise auto-regressive (ChARM) priors. Such architectures achieve additional BD-rate reductions (5.1% vs. VVC on Kodak/Tecnick/JPEG-AI/CLIC21) with practical decoder complexity.

6. Optimization Strategies and Practical Considerations

Design of ATC systems, either codebook- or model-based, requires attention to:

Optimization on manifolds: Stiefel manifold optimization is central to learning orthonormal transform codebooks (Boragolla et al., 2022)
Coding-gain–driven estimation: Direct maximization of transform coding gain produces empirically superior codebooks compared to ML-based fitting (Boragolla et al., 2024)
Computational scalability: GMRF-parameterized and gain-matrix–based models reduce the search/computation space dramatically, enabling efficient offline and (potentially) online adaptation
Signaling overhead: Codebook indices, model deltas, or side-information (e.g., hyperprior latents) must be efficiently transmitted; model-based approaches demonstrate codebook signaling rates as low as 1–3% of total bit rate (Boragolla et al., 2024)
Integration with quantization and entropy coding: ATC interacts tightly with modules such as Trellis-Coded Quantization, Probability Adaptation Rate Adjustment, and context-adaptive entropy coders (e.g., MS-AC in AV2) (Nalci et al., 6 Jan 2026)

7. Empirical Performance and Outlook

Empirically, ATC demonstrably improves coding gains across domains:

In AV2, per-block coefficient coding adaptation achieves 0.6% BD-rate reduction on natural content, with low additional complexity (Nalci et al., 6 Jan 2026)
Fixed/variable block-size model-based ATC outperforms fixed DCT and recent codebook approaches by 0.3–1.5 dB (PSNR), especially in highly textured regions (Boragolla et al., 2024)
Deep, learned ATC models—including online test-time adaptation—yield 9–41% bandwidth saving or 10–15 dB PSNR improvement versus leading engineered schemes in semantic communication scenarios (Dai et al., 2022)
State-of-the-art image compression architectures (LLIC, AICT) achieve 7–10% BD-rate improvements and outperform both conventional codecs (VTM, DCT-based) and neural baselines (Jiang et al., 2023, Ghorbel et al., 2023)

ATC now encompasses a spectrum from codebook-based linear transform selection, through low-dimensional model-driven parameterization, to end-to-end learned nonlinear, data-dependent transform adaptation, with each methodology offering distinct trade-offs in complexity, generalization, and compression ratio optimization. As evidenced by ongoing integration in video coding standards and rapid advances in learned approaches, ATC constitutes a central component in the evolution toward robust, efficient, and context-aware compression and transmission architectures.