Papers
Topics
Authors
Recent
Search
2000 character limit reached

Going beyond Compositions, DDPMs Can Produce Zero-Shot Interpolations

Published 29 May 2024 in cs.CV, cs.AI, and cs.NE | (2405.19201v2)

Abstract: Denoising Diffusion Probabilistic Models (DDPMs) exhibit remarkable capabilities in image generation, with studies suggesting that they can generalize by composing latent factors learned from the training data. In this work, we go further and study DDPMs trained on strictly separate subsets of the data distribution with large gaps on the support of the latent factors. We show that such a model can effectively generate images in the unexplored, intermediate regions of the distribution. For instance, when trained on clearly smiling and non-smiling faces, we demonstrate a sampling procedure which can generate slightly smiling faces without reference images (zero-shot interpolation). We replicate these findings for other attributes as well as other datasets. Our code is available at https://github.com/jdeschena/ddpm-zero-shot-interpolation.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (91)
  1. Understanding and improving interpolation in autoencoders via an adversarial regularizer. arXiv preprint arXiv:1807.07543, 2018.
  2. A theory of independent mechanisms for extrapolation in generative models, 2021.
  3. Infogan: Interpretable representation learning by information maximizing generative adversarial nets, 2016.
  4. Diffusion models beat gans on image synthesis, 2021.
  5. Dream the impossible: Outlier imagination with diffusion models, 2023a.
  6. Implicit generation and modeling with energy based models. In Wallach, H., Larochelle, H., Beygelzimer, A., d'Alché-Buc, F., Fox, E., and Garnett, R. (eds.), Advances in Neural Information Processing Systems, volume 32. Curran Associates, Inc., 2019. URL https://proceedings.neurips.cc/paper_files/paper/2019/file/378a063b8fdb1db941e34f4bde584c7d-Paper.pdf.
  7. Implicit generation and generalization in energy-based models, 2020.
  8. Compositional visual generation and inference with energy based models, 2020.
  9. Unsupervised learning of compositional energy concepts, 2021a.
  10. Improved contrastive divergence training of energy based models, 2021b.
  11. Reduce, reuse, recycle: Compositional generation with energy-based diffusion models and mcmc, 2023b.
  12. Duchi John, S. J. Multivariate normal distribution: Kullback-leibler divergence, 05 2020. URL https://statproofbook.github.io/P/mvn-kl.html. Accessed on May 29, 2024.
  13. Data filtering networks, 2023.
  14. Philosophical and mathematical correspondence. (No Title), 1980.
  15. Unified concept editing in diffusion models, 2023.
  16. Multilinear latent conditioning for generating unseen attribute combinations, 2020.
  17. Denoising diffusion models for out-of-distribution detection, 2023a.
  18. Unsupervised 3d out-of-distribution detection with latent diffusion models, 2023b.
  19. On calibration of modern neural networks, 2017.
  20. Adaptive diffusion priors for accelerated mri reconstruction. Medical Image Analysis, 88:102872, 2023. ISSN 1361-8415. doi: https://doi.org/10.1016/j.media.2023.102872. URL https://www.sciencedirect.com/science/article/pii/S1361841523001329.
  21. Ganspace: Discovering interpretable gan controls. Advances in neural information processing systems, 33:9841–9850, 2020.
  22. Gans trained by a two time-scale update rule converge to a nash equilibrium. CoRR, abs/1706.08500, 2017. URL http://arxiv.org/abs/1706.08500.
  23. beta-VAE: Learning basic visual concepts with a constrained variational framework. In International Conference on Learning Representations, 2017. URL https://openreview.net/forum?id=Sy2fzU9gl.
  24. Classifier-free diffusion guidance, 2022.
  25. Denoising diffusion probabilistic models, 2020.
  26. Hyvärinen, A. Estimation of non-normalized statistical models by score matching. J. Mach. Learn. Res., 6:695–709, dec 2005. ISSN 1532-4435.
  27. Excessive invariance causes adversarial vulnerability, 2020.
  28. Clevr: A diagnostic dataset for compositional language and elementary visual reasoning, 2016.
  29. Generalization in diffusion models arises from geometry-adaptive harmonic representation, 2023.
  30. Scaling up gans for text-to-image synthesis. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2023.
  31. A style-based generator architecture for generative adversarial networks. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp.  4401–4410, 2019.
  32. Diffusionclip: Text-guided diffusion models for robust image manipulation, 2022.
  33. Adam: A method for stochastic optimization, 2017.
  34. Glow: Generative flow with invertible 1x1 convolutions. Advances in neural information processing systems, 31, 2018.
  35. Leveraging off-the-shelf diffusion model for multi-attribute fashion image manipulation, 2022.
  36. Simple and scalable predictive uncertainty estimation using deep ensembles, 2017.
  37. Microsoft coco: Common objects in context, 2015.
  38. Learning to compose visual relations, 2021.
  39. Compositional visual generation with composable diffusion models, 2023a.
  40. Deep learning face attributes in the wild. In Proceedings of International Conference on Computer Vision (ICCV), December 2015.
  41. Unsupervised out-of-distribution detection with diffusion inpainting, 2023b.
  42. Luce, R. D. Semiorders and a theory of utility discrimination. Econometrica, 24(2):178–191, 1956. ISSN 00129682, 14680262. URL http://www.jstor.org/stable/1905751.
  43. Luo, C. Understanding diffusion models: A unified perspective, 2022.
  44. Generative AI Has a Visual Plagiarism Problem - IEEE Spectrum, 2024. URL https://spectrum.ieee.org/midjourney-copyright.
  45. Spectral normalization for generative adversarial networks, 2018.
  46. The Chernoff Bound, pp.  43–46. Springer Berlin Heidelberg, Berlin, Heidelberg, 2002. ISBN 978-3-642-04016-0. doi: 10.1007/978-3-642-04016-0_5. URL https://doi.org/10.1007/978-3-642-04016-0_5.
  47. Learning deep energy models. In Proceedings of the 28th international conference on machine learning (ICML-11), pp.  1105–1112, 2011.
  48. Improved denoising diffusion probabilistic models. CoRR, abs/2102.09672, 2021. URL https://arxiv.org/abs/2102.09672.
  49. Compositional abilities emerge multiplicatively: Exploring diffusion models on a synthetic task, 2023.
  50. Agree to disagree: Diversity through disagreement for better transferability, 2022.
  51. A call to reflect on evaluation practices for age estimation: Comparative analysis of the state-of-the-art and a unified benchmark, 2023.
  52. Deep structural causal models for tractable counterfactual inference. In Larochelle, H., Ranzato, M., Hadsell, R., Balcan, M., and Lin, H. (eds.), Advances in Neural Information Processing Systems, volume 33, pp.  857–869. Curran Associates, Inc., 2020. URL https://proceedings.neurips.cc/paper_files/paper/2020/file/0987b8b338d6c90bbedd8631bc499221-Paper.pdf.
  53. Scalable diffusion models with transformers, 2023.
  54. Pelletier, F. J. The principle of semantic compositionality. Topoi, 13(1):11–24, Mar 1994. ISSN 1572-8749. doi: 10.1007/BF00763644. URL https://doi.org/10.1007/BF00763644.
  55. Semiorders. Theory and Decision Library B. Springer, Dordrecht, Netherlands, 1997 edition, June 1997.
  56. Unsupervised representation learning with deep convolutional generative adversarial networks, 2016.
  57. Learning transferable visual models from natural language supervision, 2021.
  58. Towards fairness in ai: Addressing bias in data using gans. In Stephanidis, C., Kurosu, M., Chen, J. Y. C., Fragomeni, G., Streitz, N., Konomi, S., Degen, H., and Ntoa, S. (eds.), HCI International 2021 - Late Breaking Papers: Multimodality, eXtended Reality, and Artificial Intelligence, pp.  509–518, Cham, 2021. Springer International Publishing. ISBN 978-3-030-90963-5.
  59. Hierarchical text-conditional image generation with clip latents, 2022.
  60. Do imagenet classifiers generalize to imagenet?, 2019.
  61. High-resolution image synthesis with latent diffusion models, 2022.
  62. Imagenet large scale visual recognition challenge, 2015.
  63. Photorealistic text-to-image diffusion models with deep language understanding, 2022.
  64. Assessing generative models via precision and recall, 2018.
  65. Improved techniques for training gans, 2016.
  66. Fairness gan: Generating datasets with fairness properties using a generative adversarial network. IBM Journal of Research and Development, 63(4/5):3:1–3:9, 2019. doi: 10.1147/JRD.2019.2945519.
  67. The pitfalls of simplicity bias in neural networks, 2020.
  68. Gan-control: Explicitly controllable gans. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pp.  14083–14093, October 2021.
  69. Deep unsupervised learning using nonequilibrium thermodynamics, 2015.
  70. Diffusion art or digital forgery? investigating data replication in diffusion models, 2022.
  71. Understanding and mitigating copying in diffusion models, 2023.
  72. Denoising diffusion implicit models, 2022.
  73. Generative modeling by estimating gradients of the data distribution. In Wallach, H., Larochelle, H., Beygelzimer, A., d'Alché-Buc, F., Fox, E., and Garnett, R. (eds.), Advances in Neural Information Processing Systems, volume 32. Curran Associates, Inc., 2019. URL https://proceedings.neurips.cc/paper_files/paper/2019/file/3001ef257407d5a371a96dcd947c7d93-Paper.pdf.
  74. Improved techniques for training score-based generative models, 2020.
  75. Score-based generative modeling through stochastic differential equations, 2021.
  76. Rethinking the inception architecture for computer vision, 2015.
  77. Netv2: Smaller models and faster training. CoRR, abs/2104.00298, 2021. URL https://arxiv.org/abs/2104.00298.
  78. Improving the fairness of deep generative models without retraining, 2021.
  79. Deep feature interpolation for image content changes. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp.  7064–7073, 2017.
  80. Conditional image generation with pixelcnn decoders, 2016.
  81. Vincent, P. A connection between score matching and denoising autoencoders. Neural Computation, 23(7):1661–1674, 2011. doi: 10.1162/NECO_a_00142.
  82. Interpolating between images with diffusion models. 2023.
  83. Bayesian learning via stochastic gradient langevin dynamics. In Proceedings of the 28th International Conference on International Conference on Machine Learning, ICML’11, pp.  681–688, Madison, WI, USA, 2011. Omnipress. ISBN 9781450306195.
  84. Compositional generalization from first principles, 2023.
  85. Williams, E. Semantic vs. syntactic categories. Linguistics and philosophy, pp.  423–446, 1983.
  86. A theory of generative convnet, 2016.
  87. Scaling autoregressive models for content-rich text-to-image generation, 2022.
  88. Exploring diffusion time-steps for unsupervised representation learning, 2024.
  89. Bias and generalization in deep generative models: An empirical study, 2018.
  90. General facial representation learning in a visual-linguistic manner, 2022.
  91. Unseen image synthesis with diffusion models, 2023.
Citations (1)

Summary

No one has generated a summary of this paper yet.

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 2 tweets with 1 like about this paper.