Papers
Topics
Authors
Recent
Search
2000 character limit reached

Visual Information Fidelity (VIF) Measure

Updated 3 February 2026
  • Visual Information Fidelity (VIF) is an information-theoretic index that measures the proportion of visual information retained after image distortion.
  • It employs wavelet decompositions and probabilistic models, including GSM and MGGD, to accurately capture both standard and heavy-tailed distortions.
  • The GGSM-VIF extension adaptively estimates local parameters, enhancing sensitivity and performance for assessing user-generated content distortions.

Visual Information Fidelity (VIF) is an information-theoretic full-reference image quality assessment (IQA) index designed to quantify the visual similarity between a reference image and its distortion, grounded in probabilistic modeling of natural scene statistics (NSS) and incorporating models of the human visual system’s (HVS) information processing. The VIF index originally relies on a Gaussian Scale Mixture (GSM) model of natural image wavelet subband coefficients and has recently been generalized to employ Multivariate Generalized Gaussian Distributions (MGGD), enabling improved robustness to atypical or severe image distortions and better modeling of empirical coefficient distributions, as seen in user-generated content (Venkataramanan et al., 2023).

1. Foundational Principles of the VIF Measure

The VIF metric is founded on the premise that visual quality can be quantified via information fidelity: the proportion of visual information preserved between a reference image and its distortion, as measured by mutual information rates in the domain of natural image statistics. The computation involves the following key components:

  • Wavelet or Steerable-Pyramid Decomposition: The image is partitioned into KK subbands via a multi-scale, multi-orientation transform.
  • Local Coefficient Modeling: Within each subband kk, an MM-dimensional vector of coefficients CikC_i^k (reference) is extracted from the iith spatial neighborhood.
  • Distortion Model: The corresponding distorted coefficients DikD_i^k are assumed to follow Dik=gikCik+VikD_i^k = g_i^k C_i^k + V_i^k, where gikg_i^k is a deterministic gain, VikN(0,σv2I)V_i^k \sim N(0, \sigma_v^2 I) is additive Gaussian noise.
  • Observer Model: To approximate perceptual mechanisms, the observed coefficients include “neural noise” Nik,NikN(0,σn2I)N_i^k, N_i^{\prime k} \sim N(0, \sigma_n^2 I), yielding Eik=Cik+NikE_i^k = C_i^k + N_i^k (reference) and Fik=gikCik+NikF_i^k = g_i^k C_i^k + N_i^{\prime k} (distorted).

The latent coefficients CikC_i^k are crucially modeled using a scale mixture, capturing heavy-tailed marginal statistics empirically observed in natural images.

2. Gaussian Scale Mixture (GSM) Model and Original VIF Definition

The original VIF assumes a GSM model for wavelet coefficients:

  • C=ZUC = Z U, where UN(0,Σu)U \sim N(0, \Sigma_u) and the mixing variable Z0Z \geq 0 is independent and positive.
  • This captures local variance and heavy tails using the underlying Gaussian vector UU and spatially-varying scale ZZ.

Mutual information rates, conditioned on a fixed realization Z=zZ = z, are derived as:

  • Reference: I(Cik;EikZik=z)=h(Cik+NikZik=z)h(Nik)I(C_i^k; E_i^k | Z_i^k = z) = h(C_i^k + N_i^k | Z_i^k = z) - h(N_i^k)
  • Distorted: I(Cik;FikZik=z)=h(gikCik+NikZik=z)h(Nik)I(C_i^k; F_i^k | Z_i^k = z) = h(g_i^k C_i^k + N_i^{\prime k} | Z_i^k = z) - h(N_i^{\prime k})

Summing over all subbands and spatial neighborhoods yields total “source” and “distorted” information:

  • Isource=k=1Ki=1NI(Cik;EikZik)I_\text{source} = \sum_{k=1}^{K} \sum_{i=1}^{N} I(C_i^k; E_i^k | Z_i^k)
  • Idist=k=1Ki=1NI(Cik;FikZik)I_\text{dist} = \sum_{k=1}^{K} \sum_{i=1}^{N} I(C_i^k; F_i^k | Z_i^k)

The VIF index is defined as: VIF=IdistIsource\mathrm{VIF} = \frac{I_\text{dist}}{I_\text{source}}

This ratio reflects the relative amount of visual information that survives distortion.

3. Generalized GSM (GGSM) and the Multivariate Generalized Gaussian Distribution (MGGD)

Empirical image data—particularly user-generated content—often exhibit deviations from the Gaussian assumption. The MGGD provides a more flexible modeling framework, with probability density

fU(u)=β2n/2πn/2Σ1/2Γ(n/(2β))exp[(uTΣ1u)β/2]f_U(u) = \frac{\beta}{2^{n/2} \pi^{n/2} |\Sigma|^{1/2} \Gamma(n/(2\beta))} \exp\left[ - (u^{T}\Sigma^{-1}u)^{\beta/2} \right]

where nn is the coefficient vector dimension, Σ\Sigma is the scatter matrix, and β>0\beta > 0 is a shape parameter controlling tail-heaviness:

  • β=1\beta = 1: Gaussian distribution (lightest tails)
  • β<1\beta < 1: Leptokurtic (heavier tails)
  • β>1\beta > 1: Sub-Gaussian (lighter tails).

Statistical properties relevant for information-theoretic computations include:

  • Covariance: Cov(U)=m2(β,n)Σ\mathrm{Cov}(U) = m_2(\beta, n) \Sigma, with m2(β,n)=Γ((n+2)/2β)/Γ(n/2β)m_2(\beta, n) = \Gamma((n+2)/2\beta) / \Gamma(n/2\beta).
  • Differential entropy: h(U)=n2βlog[β/(2n/2πn/2Γ(n/2β))]+12logΣh(U) = \frac{n}{2\beta} -\log \left[ \beta/(2^{n/2} \pi^{n/2} \Gamma(n/2\beta) ) \right] + \frac{1}{2} \log|\Sigma|.
  • Sample kurtosis, used for β\beta estimation, relates to fourth-order moments.

This model underpins the GGSM-VIF extension, wherein the shape parameter β\beta and scatter matrix Σ\Sigma are adaptively estimated for each block.

4. Derivation and Mathematical Formulation of VIF Under GGSM

In the GGSM-VIF framework, CZ=zzUC | Z=z \sim z U for UU as a zero-mean MGGD with shape parameter β\beta and scatter z2Σuz^2 \Sigma_u.

The mutual information for each neighborhood, conditioned on Z=zZ=z, becomes:

  • Reference: IGGSM(C;EZ=z)=hMGGD(0,βs,ΣE)hGauss(σn2)I_\text{GGSM}(C; E | Z=z) = h_\text{MGGD}(0, \beta_s, \Sigma_E) - h_\text{Gauss}(\sigma_n^2), with ΣE=z2Σu+σn2I\Sigma_E = z^2 \Sigma_u + \sigma_n^2 I, and βs\beta_s determined from the empirical distribution of EE.
  • Distorted: IGGSM(C;FZ=z)=hMGGD(0,βd,ΣF)hGauss(σn2+σv2)I_\text{GGSM}(C; F | Z=z) = h_\text{MGGD}(0, \beta_d, \Sigma_F) - h_\text{Gauss}(\sigma_n^2 + \sigma_v^2), with ΣF=g2z2Σu+(σn2+σv2)I\Sigma_F = g^2 z^2 \Sigma_u + (\sigma_n^2 + \sigma_v^2) I.

The auxiliary function

ψ(Σ,β,σ2)hMGGD(0,β,Σ+σ2I)hGauss(σ2)\psi(\Sigma, \beta, \sigma^2) \equiv h_\text{MGGD}(0, \beta, \Sigma+\sigma^2 I) - h_\text{Gauss}(\sigma^2)

parameterizes the contribution per block and subband. Summing these across all neighborhoods and subbands, the generalized VIF reads: VIFGGSM=k,iψ(gik2zik2Σuk,βdk,σn2+σv2)k,iψ(zik2Σuk,βsk,σn2)\mathrm{VIF}_\text{GGSM} = \frac{\sum_{k, i} \psi(g_{i}^k{}^{2} z_{i}^k{}^{2}\Sigma_u^k, \beta_{d}^k, \sigma_n^2 + \sigma_v^2)}{\sum_{k, i} \psi(z_{i}^k{}^{2}\Sigma_u^k, \beta_{s}^k, \sigma_n^2)} A plausible implication is that this flexible adaptation to local tail behavior enables more accurate reflection of perceptually impactful distortions, especially in non-Gaussian or heavy-tailed regimes (Venkataramanan et al., 2023).

5. Estimation of MGGD Parameters in Practice

For each subband and spatial block, the estimation of MGGD parameters (Σ,β)(\Sigma, \beta) is performed as follows:

  • Compute the sample covariance C^=1NxixiT\hat{C} = \frac{1}{N}\sum x_i x_i^T and sample Mardia's kurtosis γ^2\hat{\gamma}_2.
  • Solve, via root-finding, for β\beta from the theoretical MGGD kurtosis formula: γ2=n(n+2)[Γ((n+4)/2β)Γ(n/2β)Γ((n+2)/2β)21]\gamma_2 = n(n+2)\left[\frac{\Gamma((n+4)/2\beta)\Gamma(n/2\beta)}{\Gamma((n+2)/2\beta)^2} - 1\right] by setting γ^2=γ2\hat{\gamma}_2 = \gamma_2.
  • Set Σ=C^/m2(β,n)\Sigma = \hat{C} / m_2(\beta, n) with m2m_2 as above.
  • Repeat for “noisy” observed blocks to obtain βd\beta_d, ΣE\Sigma_E, ΣF\Sigma_F as required by the model.

Empirically, this estimation scheme allows β\beta to vary by subband and block, providing enhanced modeling capacity over the fixed-Gaussian assumption.

6. Comparison: GSM-VIF Versus GGSM-VIF

Aspect GSM-VIF GGSM-VIF
Tail Modeling β=1\beta=1 (pure Gaussian, fixed) β\beta adaptively estimated, subband- and noise-specific
Distortion Handling Sensitive mainly to Gaussian-like noise Responsive to complex, heavy-tailed distortions
Empirical Performance Noted limitations on UGC Gains of 2–5 points in Spearman rank on UGC (prelim.)

The GGSM-VIF generalization enhances sensitivity to local distributional changes, particularly in challenging user-generated content. This increased fidelity is attributed to its adaptive modeling of local kurtosis and tail behavior that are not captured under the original GSM model (Venkataramanan et al., 2023). Theoretically, GGSM-VIF is expected to yield improved discrimination of distortion-induced structure changes.

7. Implementation and Application Considerations

To compute the VIF (in its original or generalized form), the following workflow is performed:

  • Decompose both reference and distorted images using a wavelet or steerable-pyramid, organizing subband data.
  • For each neighborhood, estimate parameters Σ\Sigma, β\beta for both reference and observed/distorted sets.
  • Form the relevant ψ\psi-function for each block and aggregate information across blocks and subbands as dictated by the formal equations.
  • Compute the VIF (or VIFGGSM_\text{GGSM}) index as the ratio of summed information rates.

The comprehensive mathematical derivation, parameter estimation routines, and auxiliary formulas provide a self-contained framework for implementing both GSM-VIF and GGSM-VIF approaches to full-reference image quality assessment (Venkataramanan et al., 2023).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Visual Information Fidelity (VIF) Measure.