Discretized Mixed Gaussian Likelihood
- Discretized mixed Gaussian likelihood is a probabilistic model that integrates Gaussian densities over quantized bins to accurately capture bounded discrete values.
- It enhances maximum-likelihood training by avoiding the computational burdens of categorical softmax and the boundary issues of continuous Gaussian models.
- The method leverages softmax-normalized mixture weights and closed-form integration via the standard normal CDF to achieve state-of-the-art rate-distortion performance in neural image compression.
A discretized mixed Gaussian likelihood is a parameterization widely used in high-fidelity generative modeling for data with bounded, discrete values—typified by natural images in . This likelihood models the conditional probability of each discrete value as a mixture of Gaussians, whose continuous density is integrated over intervals associated with each discrete symbol. By leveraging the flexibility of mixtures and the ability to perform proper integration, this approach provides superior maximum-likelihood training behavior compared to simple categorical, continuous Gaussian, or logistic distributions.
1. Foundational Definition and Mathematical Formulation
Let denote a single pixel value (or channel value) in an 8-bit image. The discretized mixed Gaussian likelihood represents as:
where:
- is the number of mixture components,
- is the -th mixture weight, , are the mean and (typically diagonal) standard deviation parameters for the -th Gaussian,
- is the (typically univariate or per-channel) Gaussian density,
- the integral is taken over a bin of width 1, corresponding to the quantization interval of symbol .
This construction respects the quantized structure of the data, ensuring valid probability mass for each discrete symbol. For -dimensional data (e.g., RGB vectors), the likelihood is a product (or in practice, a joint mixture via autoregression) over all dimensions.
2. Role in Generative Modeling and Compression
Discretized mixture likelihoods arose as essential components for likelihood-based models such as PixelCNN, VAE variants and neural image compressors. They enable expressive, trainable distributions over high-dimensional discrete data, robustly handling edge cases (e.g., out-of-range predictions). Unlike softmax/categorical outputs, discretized Gaussian mixtures avoid the need to enumerate all possible 256 values per pixel, and unlike unconstrained Gaussians, they are well-defined at boundaries and for integer-valued observations.
For learned neural image/video compression—where optimized latent codes and reconstructions are quantized—discretized mixed Gaussian likelihoods provide the necessary loss function (negative log-likelihood) for practical rate-distortion optimization, and enable accurate modeling of residual distributions after transform coding (Ballé et al., 2016, Chamain et al., 2020).
3. Parameterization and Implementation
The parameterization adopted in practice specifies, for each pixel (or latent), the mixture weights, means and variances. These may be predicted by an autoregressive or conditional prior (PixelCNN, hyperprior, etc.), or directly by neural nets. The mixture coefficients are produced via softmax normalization; means and variances (or their log-transforms) via unconstrained network outputs.
Efficient implementation exploits the analytical evaluation of the Gaussian integral over a bin:
where is the standard normal cumulative distribution function.
4. Relation to Other Quantized Distributions
Discretized mixture models may use other base densities, such as logistic (“discretized logistic mixture” in PixelCNN++) or Laplacian. The Gaussian formulation is universal due to its closed-form bin integral and tractable gradients. Mixtures provide multimodal flexibility, addressing the inadequacies of a single Gaussian, especially for heavy-tailed or multimodal pixel/latent residuals.
The categorical softmax approach, though well-defined for discrete data, is computationally expensive for high-dimensional cases (e.g., 256-class softmax per channel), and lacks the inductive bias favoring smooth local correlations. Continuous Gaussian likelihoods, if directly fitted, suffer from poor boundary modeling and likelihood misspecification for discrete data.
5. Applications in Learned Image Compression
Neural autoencoder-based compressive models (e.g., variational, transform, or hyperprior architectures) are fitted end-to-end for rate-distortion performance using a negative log-likelihood under a discretized mixed Gaussian model for the quantized latent or reconstruction. This allows differentiable proxies for entropy estimation and distortion computation (Ballé et al., 2016, Chamain et al., 2020). At inference, the trained model provides a probability mass function for each symbol, facilitating optimal entropy coding.
End-to-end image compression methods replace fixed codec transform and scalar quantization with a nonlinear encoder, quantization surrogate (uniform noise relaxation), and an overview transform, all optimized with a loss:
using discretized mixed Gaussian likelihood parameterizations per symbol (Ballé et al., 2016).
6. Empirical Validation and Comparative Performance
Mixture models yield state-of-the-art performance across generative modeling and compression tasks, as shown by higher MS-SSIM/PSNR for the same bit-rate, improved rate-distortion curves, and perceptual fidelity over JPEG/JPEG 2000 baselines (Ballé et al., 2016, Chamain et al., 2020). Their tractability facilitates practical training, stable gradient-based optimization, and robust implementation; ablations confirm that discretized mixed Gaussian likelihoods drastically improve quality versus single-component or continuous distributions.
| Model/Loss | Rate-distortion (bpp vs PSNR) | MS-SSIM | Perceptual Quality |
|---|---|---|---|
| Discrete softmax | Lower efficiency | – | Lower |
| Continuous Gaussian | Boundary error | – | Lower |
| Discretized mixed Gaussian | Highest | Highest | Highest |
Improvements are particularly pronounced in low/medium bit-rate regimes, and for high-dimensional image/video data (Chamain et al., 2020).
7. Limitations and Extensions
While discretized mixtures scale well for pixel-level tasks, very high-dimensional data or non-image signals may require more scalable mixture formulations. Tail probabilities and rare symbol modeling remain challenging when mixture components underfit. Extensions employing hierarchical or dynamic mixtures (e.g., hyperprior models (Chamain et al., 2020)) further enhance expressivity.
Discretized mixed Gaussian likelihoods have thus become the standard in learned quantized generative systems, compression autoencoders, and robust neural transform codecs for high-dimensional discrete data.