Variable Rate Image Compression

Updated 31 January 2026

Variable rate image compression is a neural approach that dynamically adjusts bitrates using techniques like conditional transformations, latent modulation, and parametric quantization.
Recent architectures employ deep autoencoders, invertible networks, and transformer-based modules to optimize bit allocation and enhance rate–distortion tradeoffs.
Advanced training objectives and loss functions aggregate multiple rate-distortion measures, enabling unified models that outperform traditional codecs and support efficient mobile deployment.

Variable rate image compression refers to learned or neural approaches that enable dynamic control of bitrate within a single end-to-end model, rather than requiring a separate network for each rate-distortion operating point. These frameworks exploit conditional transformations, parametric quantization, latent modulation, or context-based attention to flexibly traverse the rate–distortion curve and efficiently allocate bits, often outperforming traditional codecs and multi-model deep learning approaches.

1. Network Architectures and Transformations

Recent variable-rate compressors deploy deep autoencoders and invertible neural networks augmented with specialized transforms. The architecture in "Learned Variable-Rate Image Compression with Residual Divisive Normalization" consists of a cascade of convolutional layers and generalized divisive normalization (GDN) blocks, with ResGDN and ResIGDN residual connections in the encoder and decoder. This configuration can be schematized as:

Five-stage encoder: convolution (stride 2 for downsampling in select stages), GDN, ResGDN residual path except in the first stage
Five-stage decoder: transposed convolution, inverse GDN (IGDN), ResIGDN shortcut blocks

Stochastic rounding quantization and entropy coding (e.g., FLIF) are then applied to discrete latent representations (Akbari et al., 2019).

Invertible compressive models, as in the "High-Fidelity Variable-Rate Image Compression via Invertible Activation Transformation," embed invertible affine maps (IAT) inside each coupling block. These distribute rate control via a content- and QLevel-conditioned scale-and-bias, maintaining full bijectivity and robustness to repeated reencoding (Cai et al., 2022). Multi-scale invertible networks segment latents across spatial scales, supporting adaptive context modeling and gain-based rate control (Tu et al., 27 Mar 2025).

Transformer-based methods such as Swin-Transformers integrate prompted self-attention modules, where visual prompts conditioned on the image and a rate parameter steer attention and bitrate allocation across spatial regions (Qin et al., 2023, Mudgal et al., 28 Sep 2025).

2. Quantization Schemes and Rate Control

Variable-rate systems employ advanced quantization strategies for flexible rate adaptation:

Uniform scalar quantization: The bit depth ( $B$ ) and step size ( $\Delta$ ) parameterize quantization; stochastic rounding ensures unbiasedness (Akbari et al., 2019).
Dead-zone quantizer (RaDOGAGA): The step size $\Delta$ sets the quantization bins' width, with a symmetric zero-centered interval, enabling wide-range rate control in orthonormal latent spaces (Zhou et al., 2020).
Quantization regulator (QVRF): A vector of scale parameters $\{\alpha_i\}$ is coupled to discrete $\lambda_i$ , scaling latents before quantization and entropy coding; interpolating $\alpha$ yields truly continuous rates (Tong et al., 2023).
Quantization-reconstruction offsets: A neural offset $\delta(\sigma,\Delta)$ is added post-dequantization to optimize bias across $\Delta$ (Kamisli et al., 2024).
Gain units: Channel-wise or spatial multipliers condition latent scaling on a user-specified quality map or scalar, as in JPEG AI's 3D quality maps (Jia et al., 20 Mar 2025).

Fine-grained rate control is often achieved by either direct interpolation of scaling parameters (Sun et al., 2021) or by providing a continuous rate-control signal to the network (e.g., QLevel, $\lambda$ , spatial bitmaps), resulting in thousands of distinct operating points within a single model (Cai et al., 2022, Sun et al., 2021).

3. Training Objectives and Loss Functions

Multi-rate training requires objectives that aggregate rate-distortion loss over several bit settings:

Summed multi-rate loss: For bit-depth set $\mathcal{R}$ , train a single network on $L = 2\mathcal{L}_2 + \mathcal{L}_{MS}$ , where $\mathcal{L}_2$ and $\mathcal{L}_{MS}$ sum MSE and Multi-scale SSIM across the rates (Akbari et al., 2019).
Pareto multi-objective optimization: MGDA combines gradients for several Lagrangian settings into a minimum-norm descent direction, optimizing shared parameters for all rates in parallel (Kamisli et al., 2024).
Multi-QLevel per-pixel RD: A spatially-varying Lagrange modifier enables local adaptation and smooth rate curves (Cai et al., 2022).
Aggregated rate–distortion objectives with random or interpolated $\lambda$ or QLevel sampling ensure the model learns to generalize over the entire rate spectrum (Tu et al., 27 Mar 2025, Tong et al., 2023, Sun et al., 2021).

4. Enhancement and Context Modeling

To improve fidelity at higher rates, residual enhancement layers encode the pixel-domain difference between the input and reconstruction, using classical codecs (e.g., BPG), appended to the neural bitstream (Akbari et al., 2019, Akbari et al., 2020). Multi-scale spatial-channel context models utilize masked convolution stacks to refine entropy estimation, especially in invertible or transformer-based frameworks (Tu et al., 27 Mar 2025).

Selective compression via importance masks (learned via hyperprior networks and exponent curves) enables partial entropy coding of latents, reducing decoder runtime and supporting continuous rate control by geometric interpolation of masking parameters (Lee et al., 2022).

Region-of-interest (ROI) functionality is implemented by spatially manipulating the quality map or prompt tokens, concentrating bits in targeted zones while preserving overall rate (Jia et al., 20 Mar 2025, Mudgal et al., 28 Sep 2025, Kao et al., 2023).

5. Experimental Performance and Comparative Analysis

Recent variable-rate frameworks collectively outperform conventional codecs (BPG, JPEG2000, VVC) and prior multi-model learned methods across diverse metrics and datasets:

On Kodak, PSNR within 0.2 dB of BPG (HEVC) and consistently higher MS-SSIM at low bpp (Akbari et al., 2019)
BD-rate reductions up to –21% (PSNR, IVR vs. VTM 9.0) and –20% (MS-SSIM, IVR vs. VTM 9.0) with 9000 operating points (Sun et al., 2021)
Single-model gain of –5.1% BD-rate over VVC across 0.1–3.4 bpp (MSINN) (Tu et al., 27 Mar 2025)
Near-oracle RD matching to multi-model baselines (QVRF, MAE, selective compression, conditional autoencoders), with negligible or no degradation across full rate ranges (Tong et al., 2023, Yang et al., 2019, Lee et al., 2022, Choi et al., 2019)
Progressive RNN-based codecs dominate JPEG, JPEG2000, WebP on thumbnail tasks (Toderici et al., 2015)

Models incorporating spatial importance via gating, context modeling, ROI adaptation, or transformer-prompt modules routinely save 80–90% of parameters and storage compared to multi-model approaches while retaining or exceeding top-tier RD performance (Qin et al., 2023, Lee et al., 2022, Jia et al., 20 Mar 2025). Bit allocation can be flexibly tuned on-the-fly with continuous parameter inputs (Cai et al., 2022, Tu et al., 27 Mar 2025).

6. Efficiency, Deployment, and Limitations

The use of joint networks for all rates drastically reduces model storage and training cost (up to 90% savings), with plug-and-play modules (e.g., InterpCA, energy-based gating, BM modulator) incurring negligible computational overhead (Yin et al., 2021, Sun et al., 2021). Efficient multi-model selection and fast bitrate matching (JPEG AI) support mobile hardware deployment and deterministic rate enforcement (Jia et al., 20 Mar 2025).

Remaining challenges include context-model entropy coding time at scale (Cai et al., 2022), optimization for extreme bitrates (Tong et al., 2023), complexity of per-pixel rate modulation (Tu et al., 27 Mar 2025), and the high compute cost of deep invertible flows or transformers (Cai et al., 2022, Mudgal et al., 28 Sep 2025). Future work focuses on accelerating context modeling, meta-learning prompt manifolds for continuous rates, spatially adaptive quantization regulators, and joint video/image compression under general quality maps (Cai et al., 2022, Tu et al., 27 Mar 2025).

7. Innovations, Impact, and Standardization

Modern variable-rate compression advances the state-of-the-art by:

Enabling universal coverage of 0.1–3.4 bpp with a single model
Providing fine-grained, content- and region-adaptive bit allocation
Locking-in continuous or near-continuous rate control, well beyond discrete codec quantizer steps
Supporting flexible features such as region-of-interest boosting and color-component bit allocation (JPEG AI CCS framework) (Jia et al., 20 Mar 2025)
Enhancing decoded image fidelity and robustness under multiple re-encodings (INN/IAT, MSINN) (Cai et al., 2022, Tu et al., 27 Mar 2025)

JPEG AI encapsulates these advances in its upcoming standardization, delivering BD-rate gains of up to 19.2% over VVC intra and enabling hardware-friendly, mobile deployment with bit-precise, ROI-targeted compression (Jia et al., 20 Mar 2025). This represents a critical inflection point for learned compression as an industrial backbone.

Variable-rate learned image compression has matured from channel-wise scaling and bottleneck modulation to contextually aware, invertible, and transformer-based frameworks capable of spanning wide bitrate ranges, precise bit allocation, and deployment in production standards. The unification of efficient, continuous, and content-adaptive rate control substantiates its centrality in next-generation imaging systems.