Adversarial Symmetric VAE

Updated 20 February 2026

AS-VAE is a generative framework that unifies variational autoencoders and adversarial methods by replacing the traditional KL divergence with its symmetric variant.
It employs adversarial training with a critic to approximate log-density ratios, enabling stable gradient updates and connections with GAN/ALI models.
AS-VAE demonstrates robust performance on benchmarks like MNIST, CIFAR-10, and CelebA, achieving improved reconstruction and sample diversity.

The Adversarial Symmetric Variational Autoencoder (AS-VAE) is a generative modeling framework that unifies and generalizes variational autoencoders (VAEs) and adversarial learning techniques. AS-VAE replaces the traditional Kullback–Leibler divergence in VAEs with its symmetric variant, leading to an objective that enforces bidirectional matching between encoder and decoder joint distributions. The arising optimization problem admits an adversarial solution, providing gradient-stable training and naturally connecting to frameworks such as GANs and Adversarially Learned Inference (ALI) (Chen et al., 2017); (Pu et al., 2017).

1. Symmetric KL Divergence and Model Formulation

AS-VAE is constructed upon the symmetric Kullback–Leibler divergence for probability densities $q(u)$ and $p(u)$ : $D_{\mathrm{SKL}}(q\|p) = D_{\mathrm{KL}}(q\|p) + D_{\mathrm{KL}}(p\|q)$ For joint distributions $q(x,z) = q(x)\,q_{\phi}(z|x)$ (encoder joint) and $p(x,z) = p(z)\,p_{\theta}(x|z)$ (decoder joint), the objective becomes the minimization of $D_{\mathrm{SKL}}(q(x,z)\|p(x,z))$ .

Traditional VAE maximizes the evidence lower bound (ELBO): $\mathcal{L}_x(\theta, \phi) = \mathbb{E}_{q(x)} \mathbb{E}_{q_\phi(z|x)} \log \frac{p_\theta(x|z) p(z)}{q_\phi(z|x)}$ AS-VAE augments this with its dual: $\mathcal{L}_z(\theta, \phi) = \mathbb{E}_{p(z)} \mathbb{E}_{p_\theta(x|z)} \log \frac{q_\phi(z|x) q(x)}{p_\theta(x|z)}$ The sum $\mathcal{L}_{xz} = \mathcal{L}_x + \mathcal{L}_z$ forms a symmetric variational bound, equivalent (up to constants) to the negative symmetric KL divergence between the encoder and decoder joints: $\mathcal{L}_{xz} = -D_{\mathrm{SKL}}(q(x,z)\|p(x,z)) + C$ This formulation induces support matching between both marginals and conditionals.

2. Adversarial Training and Likelihood-Free Optimization

To achieve likelihood-free optimization of the symmetric KL, AS-VAE introduces an adversarial critic $f_\psi(x,z)$ : $\max_\psi\ g(\psi) = \mathbb{E}_{p(x,z)}\left[\log(1-\sigma(f_\psi(x,z)))\right] + \mathbb{E}_{q(x,z)}\left[\log(\sigma(f_\psi(x,z)))\right]$ At optimality, $f_{\psi^*}(x,z) = \log p(x,z) - \log q(x,z)$ . The encoder and generator are then updated using: $\max_{\theta,\phi} \ell(\theta, \phi; \psi^*) = \mathbb{E}_{q(x,z)}[f_{\psi^*}(x,z)] - \mathbb{E}_{p(x,z)}[f_{\psi^*}(x,z)]$ This adversarial procedure is directly analogous to the minimax objectives employed in GANs but specifically targets the joint distribution $p(x,z)$ versus $q(x,z)$ . The approach can be specialized to recover ALI, GAN, or WGAN objectives as limiting cases, providing a principled unification (Chen et al., 2017).

In the dual-discriminator approach (Pu et al., 2017), two discriminators $f_1(x,z)$ and $f_2(x,z)$ are trained to approximate log-density ratios of the encoder and decoder forms, optimizing GAN-style losses with respect to each side.

3. Network Architectures and Implementation

AS-VAE employs deep convolutional architectures, with encoder/decoder/discriminator parameterizations varying by dataset. All models are implemented in TensorFlow, using Xavier initialization and Adam optimizer (learning rate $10^{-4}$ , batch size 64).

MNIST (28×28 grayscale)

Encoder $q_\phi(z|x)$ : Sequential convolution (5×5, 16 filters, stride 2, ReLU+BatchNorm; 5×5, 32 filters, stride 2, ReLU+BatchNorm), fully connected to latent dimension $D_z$ .
Decoder $p_\theta(x|z)$ : FC layers (1024, then 3136 units with BN+ReLU), reshaped and followed by deconv stacks (5×5, 64→1 filters, stride 2, final sigmoid).
Discriminator $f_\psi(x,z)$ : Conv stack on $x$ (32, 64, 128 filters), MLP on $z$ , concatenated, then MLP to a single logit.

CIFAR-10 and CelebA

Analogue architectures with deeper/wider convolutional layers, Leaky ReLU non-linearities, auxiliary Gaussian noise, and pervasive batch normalization are used.

See the summary table for key elements:

Module	MNIST Example	CIFAR-10/CelebA Example
Encoder	Conv→Conv→FC	Conv stack→FC→ $(\mu,\log\sigma^2)$
Decoder	FC→FC→Deconv stack	FC→Deconv stack
Discriminator	Conv + MLP	Conv (x-path) + FC (z-path) + concat→FC

4. Training Procedure

Training proceeds in alternating steps:

Sample minibatches of $x$ and $z$ from data $q(x)$ and prior $p(z)$ .
Sample latent codes $\{z^{(i)}\}$ via the encoder, and reconstructions $\{\tilde{x}^{(j)}\}$ via the decoder.
Update the discriminator to maximize $\widehat{g}(\psi)$ using real and generated joint samples.
Update generator/encoder parameters using $\widehat{\ell}(\theta,\phi;\psi^*)$ , or its augmented version in sVAE-r, which introduces a reconstruction augmentation term $\lambda$ :

$\frac{1}{N}\sum_i\left[f_\psi(x^{(i)},z^{(i)}) + \lambda \log p_\theta(x^{(i)}|z^{(i)})\right] - \frac{1}{N}\sum_j\left[f_\psi(\tilde{x}^{(j)},z^{(j)}) - \lambda \log q_\phi(z^{(j)}|\tilde{x}^{(j)})\right]$

Repeat until convergence (Chen et al., 2017); (Pu et al., 2017).

5. Theoretical Insights and Connections

AS-VAE's symmetric KL divergence objective enforces bidirectional support matching, i.e., $\mathrm{supp} \, p(x) = \mathrm{supp} \, q(x)$ and $\mathrm{supp} \, p(z) = \mathrm{supp} \, q(z)$ , directly addressing the "unrealistic-samples" artifact of standard VAEs. The adversarial ratio estimator $f_\psi(x,z)$ ensures stable gradients, sidestepping the vanishing-gradient pathology of binary cross-entropy, as in GAN training.

Crucially, the framework recovers ALI exactly under a sigmoidal critic, and GAN/WGAN if the encoder is omitted or additional constraints (Lipschitz continuity) are imposed. The optional reconstruction augmentation parameter $\lambda$ improves cycle-consistency by strengthening per-sample reconstructions—something not inherently present in vanilla ALI (Chen et al., 2017).

6. Empirical Evaluation

AS-VAE is evaluated on a range of benchmarks: Toy 2D Gaussian Mixture, MNIST, CelebA, and CIFAR-10. The main metrics include reconstruction MSE, log-likelihood (estimated via AIS/ELBO), and Inception Score (IS).

MNIST: sVAE-r achieves an AIS log- $p(x)$ of $-79.26$ nats and IS of $9.12$, outperforming ALI, GAN, WGAN-GP, DCGAN, and equaling PixelRNN in likelihood estimates.
CelebA (64×64): sVAE-r exhibits both sharp samples and high-fidelity reconstructions across $\lambda \in \{0, 0.1, 1, 10\}$ . Baselines like ALICE trade off reconstruction and sample quality.
CIFAR-10: Unsupervised IS reaches $6.96$ (ALI: $5.34$; DCGAN: $6.16$; WGAN-GP: $6.56$). RMSE and NLL are likewise competitive or superior.
Toy 2D GMM: Over 576 random trials, sVAE-r exceeds ALICE in both IS and MSE, evidencing architectural and hyperparameter stability (Chen et al., 2017); (Pu et al., 2017).

Summary of selected experimental metrics:

Dataset	Metric	AS-VAE/sVAE(-r)	Baselines
MNIST	IS	9.12	WGAN-GP: 8.45, GAN: 8.34
	AIS log $p(x)$	–79.26 nats	PixelRNN: –79.2
CIFAR-10	IS	6.96	ALI: 5.34
CelebA	Recon/Gen	Both good	ALICE: trade-off

AS-VAE achieves reconstruction errors closely matching or surpassing VAE/ELBO methods while generating sharper, more diverse samples than GAN/ALI baselines.

7. Significance and Conclusions

AS-VAE establishes a theoretical and algorithmic bridge between the variational-inference orientation of VAEs and the adversarial dynamics of GANs. Its use of the symmetric KL as a foundational divergence leads to balanced learned distributions, improved sample quality, robust reconstructions, and enhanced stability in training (Chen et al., 2017); (Pu et al., 2017). Empirical evaluations corroborate its strong performance across generative modeling benchmarks. The architecture is extensible to diverse modalities (images, codes), and the framework inherently sidesteps several classical pathologies of deep generative models.

Markdown Report Issue Upgrade to Chat

References (2)

Symmetric Variational Autoencoder and Connections to Adversarial Learning (2017)

Adversarial Symmetric Variational Autoencoder (2017)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Adversarial Symmetric VAE (AS-VAE).