Papers
Topics
Authors
Recent
Search
2000 character limit reached

Symmetric Conditional ELBO for VAEs

Updated 6 February 2026
  • The topic introduces a symmetric conditional ELBO that reformulates VAE training as a two-player game, enforcing bidirectional encoder-decoder agreement.
  • It integrates explicit and implicit priors with conditional extensions for semi-supervised and complex latent variable models.
  • Empirical results show improved consistency and sample quality, evidenced by FID scores and high classification accuracy in various settings.

A symmetric conditional evidence lower bound (ELBO) is a variational objective function for training variational autoencoders (VAEs) that treats the encoder and decoder as equal participants in a game-theoretic framework. This approach, termed symmetric equilibrium learning, extends the classical ELBO by enforcing bidirectional consistency and allowing for training with implicit data or latent priors in both (un)conditional and conditional settings. The symmetric conditional ELBO forms the core of a Nash equilibrium training algorithm that leads to improved encoder–decoder consistency and admits learning in a broader range of latent variable models, including those with non-explicit priors or complex conditional dependencies (Flach et al., 2023).

1. Mathematical Formulation

In symmetric equilibrium learning, the VAE is framed as a two-player nonzero-sum game. Let {pθ(xz)}\{p_\theta(x|z)\} define the decoder family parameterized by θ\theta, mapping latent codes zZz \in \mathcal{Z} to data xXx \in \mathcal{X}. The encoder family {qϕ(zx)}\{q_\phi(z|x)\}, parameterized by ϕ\phi, maps data to latent codes. Both the empirical data distribution π(x)\pi(x) and the prior latent distribution π(z)\pi(z) are assumed accessible via sampling. The two players, decoder (θ)(\theta) and encoder (ϕ)(\phi), optimize their respective utilities: Lp(θ,ϕ)=Exπ(x)Ezqϕ(zx)[logpθ(xz)] Lq(θ,ϕ)=Ezπ(z)Expθ(xz)[logqϕ(zx)]\begin{aligned} L_p(\theta,\phi) &= \mathbb{E}_{x \sim \pi(x)}\, \mathbb{E}_{z \sim q_\phi(z|x)} [\log p_\theta(x|z)] \ L_q(\theta,\phi) &= \mathbb{E}_{z \sim \pi(z)}\, \mathbb{E}_{x \sim p_\theta(x|z)} [\log q_\phi(z|x)] \end{aligned} A Nash equilibrium (θ,ϕ)(\theta^*, \phi^*) occurs when neither player can improve its utility unilaterally.

Training proceeds by simultaneous (or alternating) stochastic gradient updates: θθ+αθLp(θ,ϕ),ϕϕ+αϕLq(θ,ϕ)\theta \leftarrow \theta + \alpha \nabla_\theta L_p(\theta, \phi),\quad \phi \leftarrow \phi + \alpha \nabla_\phi L_q(\theta, \phi) where gradients are estimated using only "own" conditional log-densities and do not require differentiating through the other player’s sampling operation.

2. Symmetric ELBO Construction

Both LpL_p and LqL_q can be rewritten in ELBO-like form, and their sum yields the symmetric ELBO. For the decoder utility, expanding log-likelihoods reveals: Lp(θ,ϕ)=Exπ(x)[logpθ(x)KL(qϕ(zx) pθ(zx))]L_p(\theta,\phi) = \mathbb{E}_{x \sim \pi(x)} \Big[\log p_\theta(x) - \mathrm{KL}(q_\phi(z|x)\,\|\ p_\theta(z|x))\Big] The encoder utility analogously takes the form: Lq(θ,ϕ)=Ezπ(z)[logqϕ(z)KL(pθ(xz) qϕ(xz))]L_q(\theta,\phi) = \mathbb{E}_{z \sim \pi(z)} \Big[\log q_\phi(z) - \mathrm{KL}(p_\theta(x|z)\,\|\ q_\phi(x|z))\Big] Here, qϕ(xz)q_\phi(x|z) denotes the distribution implied by the data and encoder.

Summing these yields the symmetric ELBO: Lsym(θ,ϕ)=    Exπ(x)Ezqϕ(zx)[logpθ(xz)logqϕ(zx)] +  Ezπ(z)Expθ(xz)[logqϕ(zx)logpθ(z)]\begin{aligned} \mathcal{L}_{\mathrm{sym}}(\theta, \phi) = &\;\;\mathbb{E}_{x \sim \pi(x)}\,\mathbb{E}_{z \sim q_\phi(z|x)} \left[\log p_\theta(x|z) - \log q_\phi(z|x)\right] \ &+\;\mathbb{E}_{z \sim \pi(z)}\,\mathbb{E}_{x \sim p_\theta(x|z)} \left[\log q_\phi(z|x) - \log p_\theta(z)\right] \end{aligned} Each term is a valid lower bound on the corresponding marginal log-likelihood, enforcing bidirectional consistency between encoder and decoder.

3. Conditional Extension

For semi-supervised or conditional generative modeling, side information yy (e.g., class labels) is incorporated into all distributions:

  • Conditional decoder pθ(xz,y)p_\theta(x|z, y)
  • Conditional encoder qϕ(zx,y)q_\phi(z|x, y)
  • Conditional latent prior p(zy)p(z|y)

The conditional symmetric ELBO is: Lsym(θ,ϕy)=    Exπ(xy)Ezqϕ(zx,y)[logpθ(xz,y)logqϕ(zx,y)] +  Ezp(zy)Expθ(xz,y)[logqϕ(zx,y)logpθ(zy)]\begin{aligned} \mathcal{L}_{\mathrm{sym}}(\theta,\phi\,|\,y) = &\;\;\mathbb{E}_{x \sim \pi(x|y)}\,\mathbb{E}_{z \sim q_\phi(z|x,y)} [\log p_\theta(x|z,y) - \log q_\phi(z|x,y)] \ &+\;\mathbb{E}_{z \sim p(z|y)}\,\mathbb{E}_{x \sim p_\theta(x|z,y)} [\log q_\phi(z|x,y) - \log p_\theta(z|y)] \end{aligned} This conditional objective supports learning with non-explicit conditional priors and enables direct application in settings such as semi-supervised learning.

4. Algorithmic Implementation

Training follows a dual-objective stochastic optimization over data-label minibatches, using Monte Carlo to approximate expectations. No reparameterization is required for the encoder’s gradient estimation since gradients are computed only with respect to the player's own parameters:

1
2
3
4
5
6
7
8
9
10
11
For each minibatch:
    For each (x, y):
        Sample z ~ q_phi(z|x, y)
        Accumulate theta-gradient: grad_theta += ∇_theta log p_theta(x|z, y)
    For each y in batch:
        Sample z' ~ p(z|y)
        Sample x' ~ p_theta(x|z', y)
        Accumulate phi-gradient: grad_phi += ∇_phi log q_phi(z'|x', y)
Parameter updates:
    theta ← theta + learning_rate_theta * grad_theta
    phi ← phi + learning_rate_phi * grad_phi
This approach does not require density ratio estimation, supports fully implicit models, and applies to both continuous and discrete latent spaces by back-propagating through encoder logits as needed.

5. Theoretical Properties and Guarantees

The game-theoretic framework admits several formal guarantees:

  • Equilibrium uniqueness: In an exponential-family extension, the multi-player game is diagonally strictly concave (in the Rosen sense), ensuring uniqueness and asymptotic stability of the Nash equilibrium.
  • Consistency regularisation: At equilibrium, qϕ(zx)pθ(zx)q_\phi(z|x) \approx p_\theta(z|x) and pθ(xz)qϕ(xz)p_\theta(x|z) \approx q_\phi(x|z), giving improved match between encoder and decoder as measured, for instance, by FID scores for samples drawn from the stationary distribution of a Gibbs chain.
  • No discriminator/statistical ratio estimation: Neither adversarial objectives nor explicit density ratio estimation are required, in contrast with adversarial VAE frameworks.

6. Empirical Results and Application Scenarios

Empirical evaluations of symmetric equilibrium learning (Flach et al., 2023) demonstrate effectiveness across multiple settings:

  • Hierarchical VAEs (MNIST, Fashion-MNIST): Two-layer discrete latent encoders are compared under standard versus symmetric training, with symmetric models outperforming in FID for both random-sampled and chain stationary distribution samples. t-SNE plots show improved clustering and posterior-prior alignment.
  • Semi-supervised/conditional MNIST: Label information is encoded in the latent space; decoder is trained on (x,c)(x, c) only but the encoder learns solely from sleep samples. The encoder attains >99% classification accuracy without explicit discriminative loss, and internal representation disentangles class from class-irrelevant variations.
  • Generative semantic segmentation (CelebA-HQ): A nested three-player game trains a segmentation decoder, image decoder, and shared encoder. The model supports joint generation, segmentation from image, and image in-painting from partial information, achieving ≈90% segmentation accuracy on held-out data alongside plausible completions.

Observed advantages include superior encoder–decoder consistency, the ability to handle implicit and discrete latent distributions, and matching or exceeding the sample quality of standard ELBO-trained VAEs.

7. Context and Significance

The symmetric conditional ELBO generalizes the original variational formulation underlying the auto-encoding variational Bayes paradigm by treating the encoder and decoder symmetrically and aligning their induced conditionals. It relaxes the requirement for explicit priors, resolves discrepancies between encoder and decoder densities, and unifies approaches under a unique and stable equilibrium. This framework builds on connections to the wake-sleep algorithm and adversarial VAE architectures but avoids the reliance on discriminators or ratio estimation. A plausible implication is increased flexibility and robustness in training VAEs for complex data modalities, structured tasks, and semi-supervised learning scenarios (Flach et al., 2023).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Symmetric Conditional Evidence Lower Bound (ELBO).