Papers
Topics
Authors
Recent
Search
2000 character limit reached

LAPTOP-Diff: Layer Pruning and Normalized Distillation for Compressing Diffusion Models

Published 17 Apr 2024 in cs.CV | (2404.11098v4)

Abstract: In the era of AIGC, the demand for low-budget or even on-device applications of diffusion models emerged. In terms of compressing the Stable Diffusion models (SDMs), several approaches have been proposed, and most of them leveraged the handcrafted layer removal methods to obtain smaller U-Nets, along with knowledge distillation to recover the network performance. However, such a handcrafting manner of layer removal is inefficient and lacks scalability and generalization, and the feature distillation employed in the retraining phase faces an imbalance issue that a few numerically significant feature loss terms dominate over others throughout the retraining process. To this end, we proposed the layer pruning and normalized distillation for compressing diffusion models (LAPTOP-Diff). We, 1) introduced the layer pruning method to compress SDM's U-Net automatically and proposed an effective one-shot pruning criterion whose one-shot performance is guaranteed by its good additivity property, surpassing other layer pruning and handcrafted layer removal methods, 2) proposed the normalized feature distillation for retraining, alleviated the imbalance issue. Using the proposed LAPTOP-Diff, we compressed the U-Nets of SDXL and SDM-v1.5 for the most advanced performance, achieving a minimal 4.0% decline in PickScore at a pruning ratio of 50% while the comparative methods' minimal PickScore decline is 8.2%.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (55)
  1. Daria Bakshandaeva Christoph Schuhmann Ksenia Ivanova Alex Shonenkov, Misha Konstantinov and Nadiia Klokova. 2023. If by deepfloyd lab at stabilityai. https://huggingface.co/spaces/DeepFloyd/IF
  2. A study on the evaluation of generative models. arXiv preprint arXiv:2206.10935 (2022).
  3. Ollin Boer Bohan. 2023. Sdxl-vae-fp16-fix. https://huggingface.co/madebyollin/sdxl-vae-fp16-fix
  4. Shi Chen and Qi Zhao. 2018. Shallowing deep networks: Layer-wise pruning based on feature representations. TPAMI (2018).
  5. Speed is all you need: On-device acceleration of large diffusion models via gpu-aware optimizations. In CVPR.
  6. Prafulla Dhariwal and Alexander Nichol. 2021. Diffusion models beat gans on image synthesis. In NeurIPS.
  7. To filter prune, or to layer prune, that is the question. In ACCV.
  8. Structural pruning for diffusion models. In NeurIPS.
  9. Progressive Knowledge Distillation Of Stable Diffusion XL Using Layer Level Loss. arXiv preprint arXiv:2401.02677 (2024).
  10. CLIPScore: A Reference-free Evaluation Metric for Image Captioning. In EMNLP.
  11. Gans trained by a two time-scale update rule converge to a local nash equilibrium. In NeurIPS.
  12. Distilling the knowledge in a neural network. In NeurIPS Workshop.
  13. Denoising diffusion probabilistic models. In NeurIPS.
  14. Jonathan Ho and Tim Salimans. 2021. Classifier-Free Diffusion Guidance. In NeurIPS 2021 Workshop on Deep Generative Models and Downstream Applications.
  15. Shortened LLaMA: A Simple Depth Pruning for Large Language Models. arXiv preprint arXiv:2402.02834 (2024).
  16. On architectural compression of text-to-image diffusion models. arXiv preprint arXiv:2305.15798 (2023).
  17. Pick-a-pic: An open dataset of user preferences for text-to-image generation. In NeurIPS.
  18. pickapic_v1. https://huggingface.co/datasets/yuvalkirstain/pickapic_v1
  19. KOALA: Self-attention matters in knowledge distillation of latent diffusion models for memory-efficient and fast image synthesis. arXiv preprint arXiv:2312.04005 (2023).
  20. Faster diffusion: Rethinking the role of unet encoder in diffusion models. arXiv preprint arXiv:2312.09608 (2023).
  21. Q-diffusion: Quantizing diffusion models. In ICCV.
  22. Flow Straight and Fast: Learning to Generate and Transfer Data with Rectified Flow. In ICLR.
  23. Dpm-solver: A fast ode solver for diffusion probabilistic model sampling in around 10 steps. In NeurIPS.
  24. Dpm-solver++: Fast solver for guided sampling of diffusion probabilistic models. arXiv preprint arXiv:2211.01095 (2022).
  25. Latent consistency models: Synthesizing high-resolution images with few-step inference. arXiv preprint arXiv:2310.04378 (2023).
  26. Deepcache: Accelerating diffusion models for free. arXiv preprint arXiv:2312.00858 (2023).
  27. Stable-diffusion-xl-base-1.0. https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0
  28. SDXL: Improving Latent Diffusion Models for High-Resolution Image Synthesis. In ICLR.
  29. Learning transferable visual models from natural language supervision. In ICML.
  30. Zero-shot text-to-image generation. In ICML.
  31. High-resolution image synthesis with latent diffusion models. In CVPR.
  32. Stable-diffusion-v1-5. https://huggingface.co/runwayml/stable-diffusion-v1-5
  33. Fitnets: Hints for thin deep nets. In ICLR.
  34. Photorealistic text-to-image diffusion models with deep language understanding. In NeurIPS.
  35. Tim Salimans and Jonathan Ho. 2022. Progressive Distillation for Fast Sampling of Diffusion Models. In ICLR.
  36. DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter. arXiv preprint arXiv:1910.01108 (2019).
  37. Adversarial diffusion distillation. arXiv preprint arXiv:2311.17042 (2023).
  38. Laion-aesthetics v2 6+. https://huggingface.co/datasets/ChristophSchuhmann/improved_aesthetics_6plus
  39. Laion2B-en. https://huggingface.co/datasets/laion/laion2B-en
  40. Segmind. 2023a. Small-sd. https://huggingface.co/segmind/small-sd
  41. Segmind. 2023b. Tiny-sd. https://huggingface.co/segmind/tiny-sd
  42. SG_161222. 2023. Realistic Vision. https://civitai.com/models/4201?modelVersionId=114367
  43. Post-training quantization on diffusion models. In CVPR.
  44. socalguitarist. 2023. ProtoVision XL. https://civitai.com/models/125703?modelVersionId=172397
  45. Denoising Diffusion Implicit Models. In ICLR.
  46. Score-Based Generative Modeling through Stochastic Differential Equations. In ICLR.
  47. RDO-Q: Extremely Fine-Grained Channel-Wise Quantization via Rate-Distortion Optimization. In ECCV.
  48. Human preference score v2: A solid benchmark for evaluating human preferences of text-to-image synthesis. arXiv preprint arXiv:2306.09341 (2023).
  49. Better aligning text-to-image models with human preference. In ICCV.
  50. Imagereward: Learning and evaluating human preferences for text-to-image generation. In NeurIPS.
  51. ImageRewardDB. https://huggingface.co/datasets/THUDM/ImageRewardDB
  52. Efficient joint optimization of layer-adaptive weight pruning in deep neural networks. In ICCV.
  53. Ufogen: You forward once large scale text-to-image generation via diffusion gans. arXiv preprint arXiv:2311.09257 (2023).
  54. Zavy. 2023. ZavyChromaXL. https://civitai.com/models/119229/zavychromaxl
  55. MobileDiffusion: Subsecond Text-to-Image Generation on Mobile Devices. arXiv preprint arXiv:2311.16567 (2023).
Citations (11)

Summary

No one has generated a summary of this paper yet.

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 1 tweet with 0 likes about this paper.