Semi-Conditional Normalizing Flow (SCNF)
- SCNF is a semi-supervised model that explicitly defines the joint distribution over inputs and labels using a two-stage (unconditional and conditional) flow architecture.
- It leverages conditional affine coupling layers to enable tractable density computation and efficient marginal likelihood estimation via a log-sum-exp formulation.
- Empirical results on benchmarks like MNIST demonstrate SCNF’s state-of-the-art performance with low error rates, validating its design and optimization strategy.
Semi-Conditional Normalizing Flow (SCNF) is a class of normalizing flow models designed for semi-supervised learning through explicit modeling of the joint distribution over inputs and discrete labels. By employing a two-stage (semi-conditional) flow architecture—comprising an unconditional flow followed by a conditional component—SCNF efficiently leverages both labeled and unlabeled data. The architecture enables efficient computation of marginal likelihoods, supports principled parameter learning using exact joint and marginal maximum likelihood, and yields state-of-the-art performance in semi-supervised settings on canonical benchmarks (Atanov et al., 2019).
1. Joint Density Model and Decomposition
SCNF constructs an explicit model of the joint distribution for input data and discrete labels ,
where (uniform prior), and is defined by a normalizing flow with a latent Gaussian base. Introducing an invertible mapping yields, via change of variables,
with . The joint log-density thus decomposes as
This explicit formulation allows the model to maximize both joint and marginal likelihoods in a unified semi-supervised learning objective.
2. Semi-Conditional Architecture and Mapping Structure
SCNF divides the invertible mapping into two cascaded components:
- Unconditional flow : Maps input to a semantic latent and auxiliary latent , with .
- Conditional flow : Maps to and conditions explicitly on .
Formally,
The Jacobian determinant for the composition factorizes as
This two-stage structure underpins efficient marginalization over classes, with computed once per input and applied times (once for each label class).
3. Conditional Affine Coupling Layers
The conditional flow and portions of the unconditional flow are constructed from conditional affine-coupling blocks. Each block partitions the input into and applies the transformation
where and are neural networks conditioned on and the one-hot encoded label . The inverse transformation is
The log-Jacobian determinant for a single block is . This parameterization allows tractable density computation and invertibility for both conditional and unconditional components.
4. Marginal Likelihood and Computational Efficiency
The marginal likelihood for unlabeled instances computes as
Given that does not depend on , the flow computation is executed once. For each possible label , is passed through to obtain . The marginal log-likelihood thus becomes
This log-sum-exp formulation allows for efficient exact computation of both value and gradients, with posterior responsibilities facilitating gradient computation with respect to model parameters.
5. Training Objective and Optimization Strategy
SCNF maximizes the exact joint log-likelihood on labeled data and the marginal log-likelihood on unlabeled data: For labeled pairs, the calculation follows the full joint density expression, while for unlabeled data, the marginal (log-sum-exp) form is used. Stochastic gradient ascent (e.g., Adam optimizer) is applied directly to this objective. An EM-SGD variant—alternating between computing the class posteriors and performing a parameter update—yields similar performance. No variational approximations or bounds are required.
On more complex datasets, it is sometimes beneficial to introduce an auxiliary classification loss on to promote label separation, though this was unnecessary for MNIST.
6. Model Architecture and Hyperparameters
The SCNF architecture and associated hyperparameters employ the following components:
- Data preprocessing: Inputs (MNIST) are dequantized to , then transformed via with .
- Unconditional Flow : Multi-scale Glow-style network with three levels of "squeezing", ActNorm, invertible convolutions, and affine-coupling layers. Each coupling layer uses 4-layer residual MLPs (hidden width 64) for .
- Conditional Flow : Four channel-wise conditional coupling layers, also parameterized by residual MLPs processing the input and one-hot . The dimension is reduced by factoring out features at two points, but best results used .
- Training: Adam optimizer, learning rate , batch size $100$ (half labeled, half unlabeled), weight decay $0.0$. The model is trained for $100$K iterations per MNIST split.
A summary table of key architecture values from the MNIST setup follows:
| Component | Parameterization | Typical Value |
|---|---|---|
| Unconditional flow | Glow, 3 scales, 4-layer residual MLPs , | |
| Conditional flow | 4 conditional affine-coupling layers, MLPs | |
| Optimizer | Adam, learning rate, batch size | , $100$ |
7. Empirical Evaluation and Ablation Findings
Comprehensive empirical analysis demonstrates the effectiveness of SCNF in semi-supervised scenarios:
- Toy 2D classification (moons, circles, labeled): SCNF-GLOW achieves test error and NLL , significantly outperforming SCNF-GMM ( error) and unconditional flows.
- MNIST (100 labels): Kingma et al.'s VAE achieves test error; SCNF-GMM yields (insufficient), whereas SCNF-GLOW attains error, bits/dim . EM-SGD versus direct SGD yielded identical performance.
- Ablation—latent dimension : underfits with test error; achieves ; yields optimal ; or $784$ leads to overfitting.
- Data obfuscation/fairness: Classifier on overfits and generalizes poorly; classifier on attains near test accuracy, indicating that removes class information from . t-SNE confirms class separation in and mixing in .
Collectively, these results show that the two-stage semi-conditional flow architecture supports exact joint/marginal likelihood training, efficient inference in semi-supervised settings, and improved classification performance over VAE-based baselines on MNIST (Atanov et al., 2019).