Papers
Topics
Authors
Recent
Search
2000 character limit reached

DDMI: Domain-Agnostic Latent Diffusion Models for Synthesizing High-Quality Implicit Neural Representations

Published 23 Jan 2024 in cs.LG and stat.ML | (2401.12517v2)

Abstract: Recent studies have introduced a new class of generative models for synthesizing implicit neural representations (INRs) that capture arbitrary continuous signals in various domains. These models opened the door for domain-agnostic generative models, but they often fail to achieve high-quality generation. We observed that the existing methods generate the weights of neural networks to parameterize INRs and evaluate the network with fixed positional embeddings (PEs). Arguably, this architecture limits the expressive power of generative models and results in low-quality INR generation. To address this limitation, we propose Domain-agnostic Latent Diffusion Model for INRs (DDMI) that generates adaptive positional embeddings instead of neural networks' weights. Specifically, we develop a Discrete-to-continuous space Variational AutoEncoder (D2C-VAE), which seamlessly connects discrete data and the continuous signal functions in the shared latent space. Additionally, we introduce a novel conditioning mechanism for evaluating INRs with the hierarchically decomposed PEs to further enhance expressive power. Extensive experiments across four modalities, e.g., 2D images, 3D shapes, Neural Radiance Fields, and videos, with seven benchmark datasets, demonstrate the versatility of DDMI and its superior performance compared to the existing INR generative models.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (76)
  1. Learning representations and generative models for 3d point clouds. In International Conference on Machine Learning, 2018.
  2. Image generators with conditionally-independent pixel synthesis. In Conference on Computer Vision and Pattern Recognition, 2021.
  3. Mip-nerf: A multiscale representation for anti-aliasing neural radiance fields. In International Conference on Computer Vision, 2021.
  4. Is space-time attention all you need for video understanding? In International Conference on Machine Learning, 2021.
  5. Align your latents: High-resolution video synthesis with latent diffusion models. In Conference on Computer Vision and Pattern Recognition, 2023.
  6. Hexplane: A fast representation for dynamic scenes. arXiv preprint arXiv:2301.09632, 2023.
  7. Efficient geometry-aware 3d generative adversarial networks. In Conference on Computer Vision and Pattern Recognition, 2022.
  8. Shapenet: An information-rich 3d model repository. In arXiv preprint arXiv:1512.03012, 2015.
  9. Text2shape: Generating shapes from natural language by learning joint embeddings. In 14th Asian Conference on Computer Vision, 2019.
  10. Learning continuous image representation with local implicit image function. In Conference on Computer Vision and Pattern Recognition, 2021.
  11. Videoinr: Learning video implicit neural representation for continuous space-time super-resolution. In Conference on Computer Vision and Pattern Recognition, 2022.
  12. Learning implicit fields for generative shape modeling. In Conference on Computer Vision and Pattern Recognition, 2019.
  13. Stargan v2: Diverse image synthesis for multiple domains. In Conference on Computer Vision and Pattern Recognition, 2020.
  14. Diffusion models beat gans on image synthesis. In Advances in Neural Information Processing Systems, 2021.
  15. Learning signal-agnostic manifolds of neural fields. In Advances in Neural Information Processing Systems, 2021.
  16. From data to functa: Your data point is a function and you can treat it like one. In International Conference on Machine Learning, 2022a.
  17. Generative models as distributions of functions. In International Conference on Artificial Intelligence and Statistics, 2022b.
  18. Acronym: A large-scale grasp dataset based on simulation. In International Conference on Robotics and Automation, 2021.
  19. Hyperdiffusion: Generating implicit neural fields with weight-space diffusion. In International Conference on Computer Vision, 2023.
  20. A note on data biases in generative models. In arXiv preprint arXiv:2012.02516, 2020.
  21. Generative adversarial networks. In Communications of the ACM, 2020.
  22. Latent video diffusion models for high-fidelity video generation with arbitrary lengths. In arXiv preprint arXiv:2211.13221, 2022.
  23. Gans trained by a two time-scale update rule converge to a local nash equilibrium. In Advances in Neural Information Processing Systems, 2017.
  24. Classifier-free diffusion guidance. In arXiv preprint arXiv:2207.12598, 2022.
  25. Denoising diffusion probabilistic models. In Advances in Neural Information Processing Systems, 2020.
  26. Cascaded diffusion models for high fidelity image generation. In The Journal of Machine Learning Research, 2022.
  27. Progressive growing of gans for improved quality, stability, and variation. In International Conference on Learning Representations, 2018.
  28. Training generative adversarial networks with limited data. In Advances in Neural Information Processing Systems, 2020a.
  29. Analyzing and improving the image quality of stylegan. In Conference on Computer Vision and Pattern Recognition, 2020b.
  30. Elucidating the design space of diffusion-based generative models. In Advances in Neural Information Processing Systems, 2022.
  31. Adam: A method for stochastic optimization. In arXiv preprint arXiv:1412.6980, 2014.
  32. Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114, 2013.
  33. Variational mixture of hypergenerators for learning distributions over functions. In arXiv preprint arXiv:2302.06223, 2023.
  34. Learning multiple layers of features from tiny images. Toronto, ON, Canada, 2009.
  35. Improved precision and recall metric for assessing generative models. In Advances in Neural Information Processing Systems, 2019.
  36. Diffusion-sdf: Text-to-shape via voxelized diffusion. In Conference on Computer Vision and Pattern Recognition, 2023.
  37. Towards implicit text-guided 3d shape generation. In Conference on Computer Vision and Pattern Recognition, 2022.
  38. Decoupled weight decay regularization. In arXiv preprint arXiv:1711.05101, 2017.
  39. Diffusion probabilistic models for 3d point cloud generation. In Conference on Computer Vision and Pattern Recognition, 2021.
  40. Nerf in the wild: Neural radiance fields for unconstrained photo collections. In Conference on Computer Vision and Pattern Recognition, 2021.
  41. Occupancy networks: Learning 3d reconstruction in function space. In Conference on Computer Vision and Pattern Recognition, 2019.
  42. Nerf: Representing scenes as neural radiance fields for view synthesis. In Communications of the ACM, 2021.
  43. Autosdf: Shape priors for 3d completion, reconstruction and generation. In Conference on Computer Vision and Pattern Recognition, 2022.
  44. Instant neural graphics primitives with a multiresolution hash encoding. In ACM Transactions on Graphics, 2022.
  45. 3d-ldm: Neural implicit 3d shape generation with latent diffusion models. arXiv preprint arXiv:2212.00842, 2022a.
  46. Neural image representations for multi-image fusion and layer separation. In European Conference on Computer Vision, 2022b.
  47. Arbitrary-scale image synthesis. In Conference on Computer Vision and Pattern Recognition, 2022.
  48. Deepsdf: Learning continuous signed distance functions for shape representation. In Conference on Computer Vision and Pattern Recognition, 2019.
  49. Nerfies: Deformable neural radiance fields. In Conference on Computer Vision and Pattern Recognition, 2021.
  50. Convolutional occupancy networks. In European Conference on Computer Vision, 2020.
  51. Pointnet: Deep learning on point sets for 3d classification and segmentation. In Conference on Computer Vision and Pattern Recognition, 2017.
  52. Learning transferable visual models from natural language supervision. In International Conference on Machine Learning, 2021.
  53. High-resolution image synthesis with latent diffusion models. In Conference on Computer Vision and Pattern Recognition, 2022.
  54. Clip-forge: Towards zero-shot text-to-shape generation. In Conference on Computer Vision and Pattern Recognition, 2022.
  55. Diffusion-based signed distance fields for 3d shape generation. In Conference on Computer Vision and Pattern Recognition, 2023.
  56. Scene representation networks: Continuous 3d- structure-aware neural scene representations. In Advances in Neural Information Processing Systems, 2019.
  57. Implicit neural representations with periodic activation functions. In Advances in Neural Information Processing Systems, 2020.
  58. Adversarial generation of continuous images. In Conference on Computer Vision and Pattern Recognition, 2021.
  59. Stylegan-v: A continuous video generator with the price, image quality and perks of stylegan2. In Conference on Computer Vision and Pattern Recognition, 2022.
  60. Score-based generative modeling through stochastic differential equations. In International Conference on Learning Representations, 2021.
  61. Fourier features let networks learn high frequency functions in low dimensional domains. In Advances in Neural Information Processing Systems, 2020.
  62. A good image generator is what you need for high-resolution video synthesis. In International Conference on Learning Representations, 2021.
  63. This face does not exist… but it might be yours! identity leakage in generative models. In Winter Conference on Applications of Computer Vision, 2021.
  64. Mocogan: Decomposing motion and content for video generation. In Conference on Computer Vision and Pattern Recognition, 2018.
  65. Towards accurate generative models of video: A new metric & challenges. arXiv preprint arXiv:1812.01717, 2018.
  66. Score-based generative modeling in latent space. In Advances in Neural Information Processing Systems, 2021.
  67. Attention is all you need. In Advances in Neural Information Processing Systems, 2017.
  68. Learning to generate time-lapse videos using multi-stage dynamic generative adversarial networks. In Conference on Computer Vision and Pattern Recognition, 2018.
  69. Ultrasr: Spatial encoding is a missing key for implicit image function-based arbitrary-scale super-resolution. arXiv preprint arXiv:2103.12716, 2021.
  70. Videogpt: Video generation using vq-vae and transformers. In arXiv preprint arXiv:2104.10157, 2021.
  71. Lsun: Construction of a large-scale image dataset using deep learning with humans in the loop. In arXiv preprint arXiv:1506.03365, 2015.
  72. Generating videos with dynamics-aware implicit generative adversarial networks. In International Conference on Learning Representations, 2022.
  73. Video probabilistic diffusion models in projected latent space. In Conference on Computer Vision and Pattern Recognition, 2023.
  74. Sdf-stylegan: Implicit sdf-based stylegan for 3d shape generation. In Computer Graphics Forum, 2022.
  75. 3d shape generation and completion through point-voxel diffusion. In Conference on Computer Vision and Pattern Recognition, 2021.
  76. Diffusion probabilistic fields. In International Conference on Learning Representations, 2023.
Citations (5)

Summary

No one has generated a summary of this paper yet.

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 3 tweets with 1 like about this paper.