Mechanisms of Projective Composition of Diffusion Models

Published 6 Feb 2025 in cs.LG | (2502.04549v3)

Abstract: We study the theoretical foundations of composition in diffusion models, with a particular focus on out-of-distribution extrapolation and length-generalization. Prior work has shown that composing distributions via linear score combination can achieve promising results, including length-generalization in some cases (Du et al., 2023; Liu et al., 2022). However, our theoretical understanding of how and why such compositions work remains incomplete. In fact, it is not even entirely clear what it means for composition to "work". This paper starts to address these fundamental gaps. We begin by precisely defining one possible desired result of composition, which we call projective composition. Then, we investigate: (1) when linear score combinations provably achieve projective composition, (2) whether reverse-diffusion sampling can generate the desired composition, and (3) the conditions under which composition fails. We connect our theoretical analysis to prior empirical observations where composition has either worked or failed, for reasons that were unclear at the time. Finally, we propose a simple heuristic to help predict the success or failure of new compositions.

Abstract PDF Upgrade to Chat

Summary

The paper establishes projective composition as a method to combine diffusion models by preserving key aspects of each distribution through projection functions.
It identifies Factorized Conditionals and reparameterization-equivariance as crucial mechanisms linking theoretical analysis with empirical observations.
The study introduces a heuristic based on mean vector orthogonality in feature spaces to predict compositional success amid sampling challenges.

Summary

This paper, titled "Mechanisms of Projective Composition of Diffusion Models" (2502.04549), focuses on advancing the theoretical understanding of composition in diffusion models. It addresses fundamental gaps in how linear score combinations of diffusion models can achieve projective composition and length-generalization. The authors introduce a novel concept of projective composition, define essential conditions for successful compositions, and connect theoretical analysis with empirical observations. The paper also proposes a practical heuristic for predicting compositional success in diffusion models.

Projective Composition Definition

The paper introduces the concept of projective composition, where given distributions $\{p_i\}$ and associated projection functions $\{\Pi_i\}$ , a distribution $\hat{p}$ is a projective composition if $\Pi_i \sharp \hat{p} = \Pi_i \sharp p_i$ for all $i$ . This approach allows the definition of composition not purely as a function of distributions, but in terms of preserving specific aspects of each distribution. The authors emphasize that projective compositions can be truly out-of-distribution with respect to the distributions being composed.

Compositional Mechanisms

The paper delineates the conditions under which projective composition works in diffusion models. It establishes the Factorized Conditionals as a key criterion, where a set of distributions becomes independent when restricted to specific subsets of coordinates. This property ensures that linear score combinations using a "composition operator" can achieve correct projective compositions.

Figure 1: Composing diffusion models via score combination. Given two diffusion models, it is possible to sample in a way that composes content from one model with the style of another.

Further, the authors generalize this mechanism to feature spaces using diffeomorphic transformations. They demonstrate that projective composition can be achieved without explicitly knowing the feature space by leveraging the reparameterization-equivariance property of the composition operator.

Sampling Challenges

Projective composition in general feature spaces introduces challenges for sampling, as standard diffusion processes may not generate the desired compositions due to non-smoothness in compositional paths. The paper provides a theoretical insight into why these compositions, while possible at $t=0$ , may be difficult to sample using conventional methods like reverse diffusion, particularly in non-orthogonal feature spaces.

Figure 2: Compositions of models trained on multiple objects with a learned background. The model is tested for length-generalization from 1-10 objects.

Practical Implications

The paper advises the use of orthogonal transformations for feasible sampling and highlights the importance of choosing the correct background distribution for successful composition. It examines empirical cases where existing compositional methods either succeeded or failed and proposes a heuristic based on mean vector orthogonality in a disentangled feature space, such as CLIP. This heuristic provides a practical framework to predict the effectiveness of score-based compositions.

Conclusion

The authors contribute to a deeper theoretical understanding of diffusion model compositions, offering insight into the structural properties that allow for successful and meaningful compositions. This foundation can guide future research and practical applications in areas requiring complex generative models.