Papers
Topics
Authors
Recent
Search
2000 character limit reached

Symmetry-Conditioned VAE

Updated 15 January 2026
  • Symmetry-Conditioned VAE is an unsupervised model that leverages learnable group symmetries to align latent factors without requiring factor labels.
  • It utilizes an extended ELBO combined with algebraic and equivariance regularizations to enforce disentangled, axis-aligned latent representations.
  • The approach employs a learnable symmetry codebook and composite group operators to achieve multi-factor disentanglement, evaluated via the m-FVMₖ metric.

A Symmetry-Conditioned Variational Autoencoder (CVAE), as defined in the Composite Factor-Aligned Symmetry Learning (CFASL) framework, is a class of unsupervised generative latent variable models in which disentanglement emerges from explicit, learnable group symmetries acting on the latent space. Unlike prior approaches that require factor labels or known generative factors, the CFASL methodology enables a VAE to discover and align latent dimensions with independent generative symmetries by integrating a suite of algebraic regularization and equivariance conditions, operationalized entirely through a learnable symmetry codebook and composite group actions inferred from pairs of data samples (Jung et al., 2024).

1. Loss Augmentation: ELBO with Symmetry and Equivariance Constraints

The foundational loss function of the CVAE is an extended evidence lower bound (ELBO), augmented with several families of regularization and alignment losses. The total loss is: L(ϕ,θ)=Eqϕ(zx)[logpθ(xz)]βDKL(qϕ(zx)p(z))+λcLc+λplLpl+λpdLpd+λsLs+λpLp+λeeLee+λdeLde\mathcal{L}(\phi,\theta) = \mathbb{E}_{q_\phi(z|x)}[\log p_\theta(x\mid z)] - \beta\,D_{\mathrm{KL}}(q_\phi(z|x)\,\|\,p(z)) + \lambda_c\mathcal{L}_c + \lambda_{pl}\mathcal{L}_{pl} + \lambda_{pd}\mathcal{L}_{pd} + \lambda_s\mathcal{L}_s + \lambda_p\mathcal{L}_p + \lambda_{ee}\mathcal{L}_{ee} + \lambda_{de}\mathcal{L}_{de} where each λ\lambda is a hyper-parameter and the terms Lc,Lpl,Lpd,Ls,Lp\mathcal{L}_c, \mathcal{L}_{pl}, \mathcal{L}_{pd}, \mathcal{L}_s, \mathcal{L}_p encode algebraic and statistical constraints aligned with symmetry discovery. Lee\mathcal{L}_{ee} and Lde\mathcal{L}_{de} enforce group equivariance in the encoder and decoder, respectively. This composite loss systematically injects inductive bias for the emergence of factor-aligned, axis-aligned, and group-structured latent representations.

2. Learnable Symmetry Codebook

The central algebraic structure is the symmetry codebook SS. For latent dimension dd, a library B={B1,,BB}Rd×d\mathfrak{B} = \{B_1,\dots,B_B\} \subset \mathbb{R}^{d\times d} forms a basis for the relevant Lie algebra. The codebook SS is partitioned into KK sections, one per postulated generative factor: S={G1,G2,,GK}S = \{\mathcal{G}^1, \mathcal{G}^2, \dots, \mathcal{G}^K\} with each Gi={g1i,,gMi}\mathcal{G}^i = \{\mathfrak{g}^i_1, \dots, \mathfrak{g}^i_M\}, and gji\mathfrak{g}^i_j parameterized as trainable combinations of basis elements. Group elements are recovered via the matrix exponential gji=exp(gji)g^i_j = \exp(\mathfrak{g}^i_j). Each section mediates transformations corresponding to a candidate latent factor, enabling unsupervised discovery and alignment of symmetries with latent dimensions through explicit parametrization and subsequent algebraic regularization.

3. Composite Symmetry Operators and Factor Selection

To capture multi-factor generative changes, CFASL infers composite group elements from paired data. Given images (x1,x2)(x^1,x^2), their posterior parameters (μ1,σ1),(μ2,σ2)(\mu^1,\sigma^1),( \mu^2,\sigma^2 ) are concatenated to form a comparison vector hh. An attention mechanism operates within each section ii to interpolate between the MM available generators, producing a composite direction gci\mathfrak{g}_c^i. Sectionwise on/off selection is implemented with Gumbel-Softmax switches swisw^i, informed by pseudo-labels derived from latent coordinate differences, so that only factors undergoing change are activated. The resulting group element is

gc=exp(i=1Kswigci)g_c = \exp\Bigl(\sum_{i=1}^K sw^i \mathfrak{g}_c^i\Bigr)

and maps z1z^1 to an estimate of z2z^2. This construction enables compositionality, equivariance, and data-driven selection of which factors contribute to a given sample transformation, without explicit knowledge of ground-truth factors.

4. Encoder and Decoder Equivariance Integration

The CVAE architecture is structurally conventional, employing standard convolutional encoders and decoders (or optionally, a Spatial-Broadcast decoder). Uniquely, group equivariance conditions are imposed: qϕ(ψcx1)gcqϕ(x1)q_\phi(\psi_c \ast x^1) \approx g_c q_\phi(x^1)

pθ(gcz1)ψcpθ(z1)p_\theta(g_c z^1) \approx \psi_c \ast p_\theta(z^1)

where ψc\psi_c is the (unknown) input-space symmetry corresponding to gcg_c. These constraints are enforced softly via mean-square penalties (Lee\mathcal{L}_{ee} on the latent and Lde\mathcal{L}_{de} on the decoded reconstructions). Data pairs are supplied to the same encoder and decoder parameterization, eschewing the need for auxiliary towers or siamese computation streams. This design induces equivariance between input and latent/decoded transformations.

5. Algebraic Regularization and Disentanglement Losses

The symmetry codebook and factor organization are enforced with several loss terms:

  • Commutativity (Lc\mathcal{L}_c): All learned Lie algebra elements are constrained to commute, ensuring group compositionality: exp(A)exp(B)=exp(A+B)\exp(A)\exp(B)=\exp(A+B).
  • Parallelism within factors (Lpl\mathcal{L}_{pl}): Generators within the same factor section are driven to be parallel.
  • Orthogonality across factors (Lpd\mathcal{L}_{pd}): Sections corresponding to different factors are incentivized to be orthogonal.
  • Sparsity / Axis-Alignment (Ls\mathcal{L}_s): Each one-parameter subgroup is regularized to move only one latent coordinate, establishing axis-alignment.
  • Factor prediction (Lp\mathcal{L}_{p}): The section on/off mechanism is trained with cross-entropy against pseudo-labels from latent means.
  • Encoder and Decoder Equivariance (Lee,Lde\mathcal{L}_{ee}, \mathcal{L}_{de}): Penalize deviations from algebraic group action equivariance in both encoding and decoding.

Collectively, these losses effect a disentangled, group-structured latent representation aligned with independent generative axes, without recourse to supervision or labeled factor information.

6. Training Procedure

The CFASL training protocol for the CVAE is as follows:

  1. Sample a minibatch of data xii=1N{x_i}_{i=1}^N.
  2. Pair samples into (x1,x2)(x^1, x^2).
  3. For each pair:
    • Encode (μ1,σ1)=qϕ(x1), (μ2,σ2)=qϕ(x2)(\mu^1, \sigma^1) = q_\phi(x^1),\ (\mu^2, \sigma^2) = q_\phi(x^2).
    • Draw latent codes z1,z2z^1, z^2 from respective posteriors.
    • Compute attention, switches, and aggregate composite symmetry.
    • Decode reconstructions and apply symmetry transformation in latent.
  4. Compute all terms in the augmented loss as above.
  5. Update all model and codebook parameters by backpropagation.
  6. Iterate until convergence.

The method is designed to induce symmetry-based disentanglement in fully unsupervised fashion, without ever specifying factor names or identities during training (Jung et al., 2024).

7. Multi-Factor Disentanglement Evaluation: m-FVMk_k

To quantify multi-factor disentanglement, an extended evaluation metric, multi-factor fixed-variance match (m-FVMk_k), generalizes the Factor-VAE metric to scenarios where k>1k>1 factors are held fixed. For each sub-experiment:

  • kk factors are held constant.
  • Minibatches are constructed so that these factors do not vary, the standard deviation of each latent coordinate is computed, and the kk smallest-variance coordinates are compared to the known fixed factors.
  • The score tallies coincidences with ground-truth over all factor combinations and training epochs.

This metric rigorously measures disentanglement performance when controlling for multiple simultaneous generative axes and confirms that the symmetry-conditioned VAE recovers latent disentanglement under both single- and multi-factor variations (Jung et al., 2024).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Symmetry-Conditioned Variational Autoencoder (CVAE).