Papers
Topics
Authors
Recent
Search
2000 character limit reached

Fréchet Wavelet Distance: A Domain-Agnostic Metric for Image Generation

Published 23 Dec 2023 in cs.CV, cs.LG, and eess.IV | (2312.15289v3)

Abstract: Modern metrics for generative learning like Fr\'echet Inception Distance (FID) and DINOv2-Fr\'echet Distance (FD-DINOv2) demonstrate impressive performance. However, they suffer from various shortcomings, like a bias towards specific generators and datasets. To address this problem, we propose the Fr\'echet Wavelet Distance (FWD) as a domain-agnostic metric based on the Wavelet Packet Transform ($W_p$). FWD provides a sight across a broad spectrum of frequencies in images with a high resolution, preserving both spatial and textural aspects. Specifically, we use $W_p$ to project generated and real images to the packet coefficient space. We then compute the Fr\'echet distance with the resultant coefficients to evaluate the quality of a generator. This metric is general-purpose and dataset-domain agnostic, as it does not rely on any pre-trained network, while being more interpretable due to its ability to compute Fr\'echet distance per packet, enhancing transparency. We conclude with an extensive evaluation of a wide variety of generators across various datasets that the proposed FWD can generalize and improve robustness to domain shifts and various corruptions compared to other metrics.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (52)
  1. Evaluating diffusion models. https://huggingface.co/docs/diffusers/conceptual/evaluation, 2023. Accessed: 2023-10-24.
  2. Wasserstein generative adversarial networks. In International conference on machine learning, pages 214–223. PMLR, 2017.
  3. Effectively unbiased FID and inception score and where to find them. In 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2020, Seattle, WA, USA, June 13-19, 2020, pages 6069–6078. Computer Vision Foundation / IEEE, 2020.
  4. Ingrid Daubechies. Ten Lectures on Wavelets. Society for Industrial and Applied Mathematics, 1992.
  5. Watch your up-convolution: Cnn based generative deep neural networks are failing to reproduce spectral distributions. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 7890–7899, 2020.
  6. Swagan: A style-based wavelet-driven generative model. ACM Trans. Graph., 40(4), July 2021.
  7. Wavelet score-based generative modeling. Advances in Neural Information Processing Systems, 35:478–491, 2022.
  8. Human motion prediction via spatio-temporal inpainting. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 7134–7143, 2019.
  9. Gans trained by a two time-scale update rule converge to a local nash equilibrium. Advances in neural information processing systems, 30, 2017.
  10. Denoising diffusion probabilistic models. In Hugo Larochelle, Marc’Aurelio Ranzato, Raia Hadsell, Maria-Florina Balcan, and Hsuan-Tien Lin, editors, Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, NeurIPS 2020, December 6-12, 2020, virtual, 2020.
  11. Wavelet-srnet: A wavelet-based cnn for multi-scale face super resolution. In Proceedings of the IEEE international conference on computer vision, pages 1689–1697, 2017.
  12. Arne Jensen and Anders la Cour-Harbo. Ripples in mathematics: the discrete wavelet transform. Springer Science & Business Media, 2001.
  13. Progressive growing of gans for improved quality, stability, and variation. In 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings, 2018.
  14. Alias-free generative adversarial networks. Advances in Neural Information Processing Systems, 34:852–863, 2021.
  15. A style-based generator architecture for generative adversarial networks. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 4401–4410, 2019.
  16. Analyzing and improving the image quality of stylegan. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 8110–8119, 2020.
  17. Learning multiple layers of features from tiny images. 2009.
  18. The role of imagenet classes in fréchet inception distance. In The Eleventh International Conference on Learning Representations, ICLR 2023, Kigali, Rwanda, May 1-5, 2023. OpenReview.net, 2023.
  19. Pywavelets: A python package for wavelet analysis. Journal of Open Source Software, 4(36):1237, 2019.
  20. Wavelet transform-assisted adaptive generative modeling for colorization. IEEE Transactions on Multimedia, 2022.
  21. Wavelet-based dual-branch network for image demoiréing. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XIII 16, pages 86–102. Springer, 2020.
  22. Attribute-aware face aging with wavelet-based generative adversarial networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 11877–11886, 2019.
  23. Evaluating generative networks using gaussian mixtures of image features. In IEEE/CVF Winter Conference on Applications of Computer Vision, WACV 2023, Waikoloa, HI, USA, January 2-7, 2023, pages 279–288. IEEE, 2023.
  24. Stéphane Mallat. A theory for multiresolution signal decomposition: The wavelet representation. IEEE Trans. Pattern Anal. Mach. Intell., 11(7):674–693, 1989.
  25. Stéphane Mallat. A wavelet tour of signal processing. Elsevier, 1999.
  26. Moritz Wolter. Frequency Domain Methods in Recurrent Neural Networks for Sequential Data Processing. PhD thesis, Rheinische Friedrich-Wilhelms-Universität Bonn, July 2021.
  27. Reliable fidelity and diversity metrics for generative models. In International Conference on Machine Learning, pages 7176–7185. PMLR, 2020.
  28. Apollo 11 NASA. Ocean world earth. https://commons.wikimedia.org/wiki/File:Ocean_world_Earth.jpg, 1969. Accessed: 2023-10-31.
  29. Improved denoising diffusion probabilistic models. In Proceedings of the 38th International Conference on Machine Learning, volume 139 of Proceedings of Machine Learning Research, pages 8162–8171. PMLR, 18–24 Jul 2021.
  30. On aliased resizing and surprising subtleties in gan evaluation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 11410–11420, 2022.
  31. Automatic differentiation in pytorch. In 31st Conference on Neural Information Processing Systems (NIPS 2017), 2017.
  32. Scalable diffusion models with transformers. arXiv preprint arXiv:2212.09748, 2022.
  33. Wavelet diffusion models are fast and scalable image generators. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2023, Vancouver, BC, Canada, June 17-24, 2023, pages 10199–10208. IEEE, 2023.
  34. Photoplay Publishing. Albert einstein and charlie chaplin city lights premiere 1931. https://commons.wikimedia.org/wiki/File:Albert_Einstein_and_Charlie_Chaplin_City_Lights_premiere_1931.jpg, 1931. Accessed: 2023-11-09.
  35. On the spectral bias of neural networks. In International Conference on Machine Learning, pages 5301–5310. PMLR, 2019.
  36. ImageNet Large Scale Visual Recognition Challenge. International Journal of Computer Vision (IJCV), 115(3):211–252, 2015.
  37. Palette: Image-to-image diffusion models. In ACM SIGGRAPH 2022 Conference Proceedings, pages 1–10, 2022.
  38. Improved techniques for training gans. Advances in neural information processing systems, 29, 2016.
  39. Wire: Wavelet implicit neural representations. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 18507–18516, 2023.
  40. Denoising diffusion implicit models. In 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021, 2021.
  41. Wavelets and filter banks. SIAM, 1996.
  42. Rethinking the inception architecture for computer vision. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 2818–2826, 2016.
  43. scikit-image: image processing in Python. PeerJ, 2:e453, 6 2014.
  44. Multiscale transforms with application to image processing. Springer, 2018.
  45. Multi-level wavelet-based generative adversarial network for perceptual quality enhancement of compressed video. In European Conference on Computer Vision, pages 405–421. Springer, 2020.
  46. Image quality assessment: from error visibility to structural similarity. IEEE transactions on image processing, 13(4):600–612, 2004.
  47. Wavelet pooling for convolutional neural networks. In International conference on learning representations, 2018.
  48. Wavelet-packets for deepfake image analysis and detection. Machine Learning, 111(11):4295–4327, 2022.
  49. Tackling the generative learning trilemma with denoising diffusion gans. In International Conference on Learning Representations, 2022.
  50. Photorealistic style transfer via wavelet transforms. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 9036–9045, 2019.
  51. Lsun: Construction of a large-scale image dataset using deep learning with humans in the loop. ArXiv, abs/1506.03365, 2015.
  52. Styleswin: Transformer-based gan for high-resolution image generation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 11304–11314, 2022.
Citations (1)

Summary

  • The paper introduces the WPSKL metric to address FID's biases by integrating spatial and frequency-domain insights.
  • It employs a wavelet packet transform to compute KL divergence, capturing subtle differences undetected by conventional methods.
  • Experimental results on datasets like CIFAR10 and CelebAHQ show WPSKL’s superior robustness and alignment with human perception.

Wavelet Packet Power Spectrum Kullback-Leibler Divergence: A New Metric for Image Synthesis

In this study, the authors address the limitations inherent in current metrics used to evaluate generative neural networks, particularly focusing on the Fréchet Inception Distance (FID). FID, while popular, has numerous weaknesses such as bias towards specific image datasets, susceptibility to slight numerical changes, and an overemphasis on low-frequency information. This paper proposes an alternative metric, the Wavelet Packet Power Spectrum Kullback-Leibler Divergence (WPSKL), which aims to provide a more comprehensive evaluation by integrating both spatial and frequency-domain insights.

Problem Motivation and Existing Shortcomings

Existing metrics like FID heavily depend on pre-trained neural networks, introducing biases that affect their accuracy across diverse datasets and architectures. FID also requires a Gaussian distribution assumption, which may not always hold, and is sensitive to computational variations such as image resizing. These limitations lead to inconsistencies in results, as evidenced when small pixel changes, undetectable by human observers, can cause large fluctuations in FID scores.

Proposed Solution: WPSKL

WPSKL leverages the Wavelet Packet Transform (WPT) to evaluate images by capturing both spatial and frequency information. Wavelets provide a fine-grained analysis of image data by decomposing it into a set of high- and low-frequency components, preserving spatial attributes while offering frequency analysis. The WPSKL metric uses the wavelet power spectrum to compute the Kullback-Leibler (KL) divergence between the distributions of real and synthesized images. This frequency-domain focus allows for the detection of differences that are invisible to FID, particularly in cases where images are perceptually similar yet differ in frequency content.

Experimental Evaluation

The paper systematically applies the WPSKL metric across various generative models, including both GANs and diffusion models, using datasets like CIFAR10, CelebAHQ, and LSUN. The results reveal that WPSKL aligns more consistently with human perception compared to existing metrics. Notably, it shows superior robustness to slight image perturbations, providing stable feedback even when common issues such as numerical rounding or dataset-specific biases challenge FID. Furthermore, a user study corroborates the reliability of WPSKL, as it tends to agree with human judgments on image quality.

Implications and Future Directions

WPSKL offers a significant improvement in evaluating generative models, particularly in scenarios where high-level feature-based assessments such as FID are inadequate. By accounting for both spatial and frequency details, WPSKL opens new avenues for improving generative model architecture and training. Future work could explore the integration of WPSKL in real-time generative applications or its adaptation to different domains, such as video synthesis, where temporal-frequency information could further enhance its applicability.

Overall, the authors propose a rigorous, frequency-sensitive metric that can serve as a more reliable benchmark for the ongoing evaluation and enhancement of generative neural networks, promising more accurate assessments that align with human visual interpretation.

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.