Compound FID (CFID): Multi-Level Evaluation

Updated 27 January 2026

Compound FID (CFID) is a multi-level evaluation metric that integrates low-, mid-, and high-level features to assess GAN image quality.
It uses per-layer scaling and maximum aggregation to capture distortions from noise to semantic errors for enhanced sensitivity.
CFID also finds applications in conditional generative models and hyperfine-resolved magnetometry, illustrating versatile compound signal integration.

Compound FID (CFID) refers to several distinct but technically salient concepts across machine learning and atomic physics. In contemporary research, CFID most commonly denotes (1) Compound Fréchet Inception Distance for comprehensive GAN image assessment (Nunn et al., 2021), (2) Conditional Fréchet Inception Distance for measuring performance of conditional generative models (Soloveitchik et al., 2021), and (3) Compound free induction decay analysis for hyperfine-resolved optically-pumped magnetometry (Hewatt et al., 2024). Each application shares a common methodological motif: extraction or integration of multi-component signal or feature structure for enhanced sensitivity or robustness.

1. Compound Fréchet Inception Distance in Image Quality Assessment

Compound Fréchet Inception Distance (CFID), introduced by Soloveitchik et al., is an extension of the standard Fréchet Inception Distance (FID), developed to improve the evaluation of generative adversarial networks (GANs) (Nunn et al., 2021). While FID measures the distance between global feature distributions of real and generated images using the final average-pooled Inception-V3 activations, CFID incorporates features across three abstraction levels by extracting activations from multiple pooling layers:

CFID₁ (Low-level): MaxPool1 (64×73×73, $d_1=341\,056$ ) — captures local textures, fine gradients, highly responsive to noise and local artifacts.
CFID₂ (Mid-level): MaxPool2 (192×35×35, $d_2=235\,200$ ) — encodes edges and simple shapes, sensitive to moderate-scale structure and distortion.
CFID₃ (High-level): AvgPool (2048×1×1, $d_3=2048$ ) — corresponds to object/semantic content, identical to standard FID.

For datasets of real ( $X_r$ ) and generated ( $X_g$ ) images, the method computes sample means and covariances for each feature embedding, and evaluates a scaled Fréchet distance:

$\mathrm{CFID}_i(X_r,X_g) = \|\mu^{(i)}_r-\mu^{(i)}_g\|_2^2 + \mathrm{Tr}\Bigl(\alpha_i\Sigma^{(i)}_r+\alpha_i\Sigma^{(i)}_g-2(\alpha_i^2\Sigma^{(i)}_r\Sigma^{(i)}_g)^{1/2}\Bigr)$

with scaling $\alpha_i = d_i / d_3$ for $i=1,2$ , and $\alpha_3=1$ , to normalize the scale of covariance matrices across levels. The final CFID is defined as the maximum of the three per-level scores:

$\mathrm{CFID}(X_r,X_g) = \max_{i \in \{1,2,3\}} \mathrm{CFID}_i(X_r,X_g)$

This design increases sensitivity to a broad spectrum of distortions. Empirical results show CFID₁ and CFID₂ respond linearly to low-level disturbances (e.g. Gaussian or salt-and-pepper noise), while CFID₃ is more responsive to semantic distortions (e.g. blur, structural rearrangement). The aggregation strategy—taking the maximum—ensures that any severe deviation at any abstraction scale meaningfully impacts the final score.

2. Algorithmic Workflow and Implementation Considerations

The CFID computation pipeline leverages pretrained Inception-V3 architecture:

Accumulate feature means and covariances independently for each abstraction level.
Apply per-level scaling to covariances to mitigate dimensionality mismatch.
Compute FID for each level, then aggregate by maximum.
Efficient implementation advises batch sizes of 50–100 and recommends incremental or approximate methods for high-dimensional covariance computation.

CFID maintains conceptual simplicity, requiring three forward passes per batch and conventional covariance statistics at each level. However, direct covariance storage at lower levels is infeasible; practitioners employ mini-batch, low-rank, or randomized sketching methods to avoid excessive memory usage (Nunn et al., 2021).

3. Applications and Empirical Performance in GAN Evaluation

CFID targets failure modes of FID arising from its sole reliance on high-level features, which may miss local artifacts and subtle distortions in GAN outputs. Controlled artificial distortions demonstrate the following:

Gaussian noise: Only low- and mid-level CFID reflect increased distortion; high-level CFID (FID) quickly saturates and becomes uninformative.
Gaussian blur: CFID₃ (FID) climbs sharply, while CFID₁ remains insensitive.
Structural warps (e.g., spiral): Responsiveness depends on abstraction level, with mid-level (CFID₂) most sensitive for certain backgrounds.
Salt-and-pepper noise: Predominantly detected by CFID₁, poorly reflected in FID.

These outcomes indicate that each CFIDᵢ is attuned to distortions at its associated abstraction scale. Combining them via the $\max$ rule provides more robust quality assessment for generative image models (Nunn et al., 2021).

4. Advantages, Limitations, and Comparative Context

CFID remedies the bias inherent in standard FID toward semantic or global differences, providing the ability to detect local image artifacts with commensurate rigor. Its architecture is modular, compatible with standard deep learning workflows, and straightforward to implement. Limitations include:

Computational cost due to large covariance matrices at low-level embeddings (up to $341,056 \times 341,056$ ).
The aggregation rule (maximum) is acknowledged as ad hoc; alternative schemes such as weighted sums remain unexplored as of the original publication.
No large-scale validation on end-to-end GAN evaluation is presented, and the qualitative nature of the reported results indicates the need for further quantitative validation.

A plausible implication is that further research could refine aggregation strategies or extend CFID to learned, context-specific abstraction levels.

It is essential to distinguish Compound FID from the Conditional Fréchet Inception Distance (similarly abbreviated "CFID") (Soloveitchik et al., 2021). The latter quantifies the average Fréchet (Wasserstein-2) distance between pairs of conditional distributions, applied chiefly to conditional generative models. Methodologically, conditional FID evaluates:

$\mathrm{CFID}^2(p,q) = \int_{y} W_2^2 \left[ \mathcal{N}(\mu_{p|y}, \Sigma_{p|y}), \mathcal{N}(\mu_{q|y}, \Sigma_{q|y}) \right] p_Y(dy)$

with practical computation achieved by empirical averaging over the conditioning variable $y$ . This metric is structurally distinct from "Compound FID" in its purpose and its design (Soloveitchik et al., 2021).

In a separate context, "compound FID" in experimental atomic physics refers to modeling the free induction decay signal of alkali atoms as a sum of two manifold-specific precessions ( $F=2$ and $F=1$ ) to address systematic errors in magnetometry (Hewatt et al., 2024). In this setting, the compound fit enables sub-nT precision by jointly fitting both frequency components, eliminating biases intrinsic to single-frequency approaches.

6. Summary Table: CFID Variants

Variant / Abbreviation	Domain	Core Principle
Compound FID (CFID)	GAN Image Quality	Aggregates multi-level FID via Inception-V3 layers
Conditional FID (CFID)	Conditional Generation	Inception-space Wasserstein-2 for $p(\cdot\|y)$
Compound FID (Physics)	Magnetometry	Sum of hyperfine precessions for FID signal fitting

Each variant targets different scientific questions but embodies the integration of multiple distributions, levels, or components to achieve superior evaluation granularity or accuracy.

7. Implementation Recommendations and Outlook

For image modeling, practitioners should:

Employ a pretrained Inception-V3 network and extract features at the specified MaxPool1, MaxPool2, and AvgPool layers.
Normalize covariance contributions using $\alpha_i = d_i/d_3$ for $i\in\{1,2\}$ .
Use batch sizes and incremental statistics appropriate for the sample size and feature dimensionality.
Aggregate using the $\max$ operator as per the original proposal, while remaining alert to future alternatives.

In evaluating GANs, CFID offers greater diagnostic coverage across a spectrum of image distortions, providing a basis for more discriminative quality assurance than FID alone. Additional empirical validation and exploration of aggregation schemes are identified as open areas for further research (Nunn et al., 2021).

Markdown Report Issue Upgrade to Chat

References (3)

Compound Frechet Inception Distance for Quality Assessment of GAN Created Images (2021)

Conditional Frechet Inception Distance (2021)

Investigating the hyperfine systematic error and relative phase in low spin-polarization alkali FID magnetometers (2024)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Compound FID (CFID).