Visual Information Fidelity (VIF) Measure
- Visual Information Fidelity (VIF) is an information-theoretic index that measures the proportion of visual information retained after image distortion.
- It employs wavelet decompositions and probabilistic models, including GSM and MGGD, to accurately capture both standard and heavy-tailed distortions.
- The GGSM-VIF extension adaptively estimates local parameters, enhancing sensitivity and performance for assessing user-generated content distortions.
Visual Information Fidelity (VIF) is an information-theoretic full-reference image quality assessment (IQA) index designed to quantify the visual similarity between a reference image and its distortion, grounded in probabilistic modeling of natural scene statistics (NSS) and incorporating models of the human visual system’s (HVS) information processing. The VIF index originally relies on a Gaussian Scale Mixture (GSM) model of natural image wavelet subband coefficients and has recently been generalized to employ Multivariate Generalized Gaussian Distributions (MGGD), enabling improved robustness to atypical or severe image distortions and better modeling of empirical coefficient distributions, as seen in user-generated content (Venkataramanan et al., 2023).
1. Foundational Principles of the VIF Measure
The VIF metric is founded on the premise that visual quality can be quantified via information fidelity: the proportion of visual information preserved between a reference image and its distortion, as measured by mutual information rates in the domain of natural image statistics. The computation involves the following key components:
- Wavelet or Steerable-Pyramid Decomposition: The image is partitioned into subbands via a multi-scale, multi-orientation transform.
- Local Coefficient Modeling: Within each subband , an -dimensional vector of coefficients (reference) is extracted from the th spatial neighborhood.
- Distortion Model: The corresponding distorted coefficients are assumed to follow , where is a deterministic gain, is additive Gaussian noise.
- Observer Model: To approximate perceptual mechanisms, the observed coefficients include “neural noise” , yielding (reference) and (distorted).
The latent coefficients are crucially modeled using a scale mixture, capturing heavy-tailed marginal statistics empirically observed in natural images.
2. Gaussian Scale Mixture (GSM) Model and Original VIF Definition
The original VIF assumes a GSM model for wavelet coefficients:
- , where and the mixing variable is independent and positive.
- This captures local variance and heavy tails using the underlying Gaussian vector and spatially-varying scale .
Mutual information rates, conditioned on a fixed realization , are derived as:
- Reference:
- Distorted:
Summing over all subbands and spatial neighborhoods yields total “source” and “distorted” information:
The VIF index is defined as:
This ratio reflects the relative amount of visual information that survives distortion.
3. Generalized GSM (GGSM) and the Multivariate Generalized Gaussian Distribution (MGGD)
Empirical image data—particularly user-generated content—often exhibit deviations from the Gaussian assumption. The MGGD provides a more flexible modeling framework, with probability density
where is the coefficient vector dimension, is the scatter matrix, and is a shape parameter controlling tail-heaviness:
- : Gaussian distribution (lightest tails)
- : Leptokurtic (heavier tails)
- : Sub-Gaussian (lighter tails).
Statistical properties relevant for information-theoretic computations include:
- Covariance: , with .
- Differential entropy: .
- Sample kurtosis, used for estimation, relates to fourth-order moments.
This model underpins the GGSM-VIF extension, wherein the shape parameter and scatter matrix are adaptively estimated for each block.
4. Derivation and Mathematical Formulation of VIF Under GGSM
In the GGSM-VIF framework, for as a zero-mean MGGD with shape parameter and scatter .
The mutual information for each neighborhood, conditioned on , becomes:
- Reference: , with , and determined from the empirical distribution of .
- Distorted: , with .
The auxiliary function
parameterizes the contribution per block and subband. Summing these across all neighborhoods and subbands, the generalized VIF reads: A plausible implication is that this flexible adaptation to local tail behavior enables more accurate reflection of perceptually impactful distortions, especially in non-Gaussian or heavy-tailed regimes (Venkataramanan et al., 2023).
5. Estimation of MGGD Parameters in Practice
For each subband and spatial block, the estimation of MGGD parameters is performed as follows:
- Compute the sample covariance and sample Mardia's kurtosis .
- Solve, via root-finding, for from the theoretical MGGD kurtosis formula: by setting .
- Set with as above.
- Repeat for “noisy” observed blocks to obtain , , as required by the model.
Empirically, this estimation scheme allows to vary by subband and block, providing enhanced modeling capacity over the fixed-Gaussian assumption.
6. Comparison: GSM-VIF Versus GGSM-VIF
| Aspect | GSM-VIF | GGSM-VIF |
|---|---|---|
| Tail Modeling | (pure Gaussian, fixed) | adaptively estimated, subband- and noise-specific |
| Distortion Handling | Sensitive mainly to Gaussian-like noise | Responsive to complex, heavy-tailed distortions |
| Empirical Performance | Noted limitations on UGC | Gains of 2–5 points in Spearman rank on UGC (prelim.) |
The GGSM-VIF generalization enhances sensitivity to local distributional changes, particularly in challenging user-generated content. This increased fidelity is attributed to its adaptive modeling of local kurtosis and tail behavior that are not captured under the original GSM model (Venkataramanan et al., 2023). Theoretically, GGSM-VIF is expected to yield improved discrimination of distortion-induced structure changes.
7. Implementation and Application Considerations
To compute the VIF (in its original or generalized form), the following workflow is performed:
- Decompose both reference and distorted images using a wavelet or steerable-pyramid, organizing subband data.
- For each neighborhood, estimate parameters , for both reference and observed/distorted sets.
- Form the relevant -function for each block and aggregate information across blocks and subbands as dictated by the formal equations.
- Compute the VIF (or VIF) index as the ratio of summed information rates.
The comprehensive mathematical derivation, parameter estimation routines, and auxiliary formulas provide a self-contained framework for implementing both GSM-VIF and GGSM-VIF approaches to full-reference image quality assessment (Venkataramanan et al., 2023).