Semi-Conditional Normalizing Flow (SCNF)

Updated 14 December 2025

SCNF is a semi-supervised model that explicitly defines the joint distribution over inputs and labels using a two-stage (unconditional and conditional) flow architecture.
It leverages conditional affine coupling layers to enable tractable density computation and efficient marginal likelihood estimation via a log-sum-exp formulation.
Empirical results on benchmarks like MNIST demonstrate SCNF’s state-of-the-art performance with low error rates, validating its design and optimization strategy.

Semi-Conditional Normalizing Flow (SCNF) is a class of normalizing flow models designed for semi-supervised learning through explicit modeling of the joint distribution over inputs and discrete labels. By employing a two-stage (semi-conditional) flow architecture—comprising an unconditional flow followed by a conditional component—SCNF efficiently leverages both labeled and unlabeled data. The architecture enables efficient computation of marginal likelihoods, supports principled parameter learning using exact joint and marginal maximum likelihood, and yields state-of-the-art performance in semi-supervised settings on canonical benchmarks (Atanov et al., 2019).

1. Joint Density Model and Decomposition

SCNF constructs an explicit model of the joint distribution $p_0(x, y)$ for input data $x \in \mathbb{R}^d$ and discrete labels $y \in \{1, \ldots, K\}$ ,

$p_0(x, y) = p_0(y) \, p_0(x \mid y)$

where $p_0(y)=1/K$ (uniform prior), and $p_0(x \mid y)$ is defined by a normalizing flow with a latent Gaussian base. Introducing an invertible mapping $f_0(x; y, \theta)$ yields, via change of variables,

$\log p_0(x \mid y) = \log p_z(z; y) + \log |\det \partial f_0(x; y)/\partial x|$

with $p_z(z; y) = \mathcal{N}(z \mid 0, I)$ . The joint log-density thus decomposes as

$\log p_0(x, y) = \log p_0(y) + \log \mathcal{N}(z \mid 0, I) + \log |\det J_{f_0}(x; y)|$

This explicit formulation allows the model to maximize both joint and marginal likelihoods in a unified semi-supervised learning objective.

2. Semi-Conditional Architecture and Mapping Structure

SCNF divides the invertible mapping into two cascaded components:

Unconditional flow $f_v(x; w)$ : Maps input $x$ to a semantic latent $z_f \in \mathbb{R}^{d_0}$ and auxiliary latent $z_\mathrm{aux} \in \mathbb{R}^{d_1}$ , with $d_0 + d_1 = d$ .
Conditional flow $h_0(z_f, y; \phi)$ : Maps $z_f$ to $z_h$ and conditions explicitly on $y$ .

Formally,

$(z_f, z_\mathrm{aux}) = f_v(x; w)$

$z_h = h_0(z_f, y; \phi)$

The Jacobian determinant for the composition factorizes as

$|\det J_{(h \circ f)}(x; y)| = |\det J_{h}(z_f; y)| \cdot |\det J_{f}(x)|$

This two-stage structure underpins efficient marginalization over classes, with $f_v$ computed once per input and $h_0$ applied $K$ times (once for each label class).

3. Conditional Affine Coupling Layers

The conditional flow $h_0$ and portions of the unconditional flow $f_v$ are constructed from conditional affine-coupling blocks. Each block partitions the input $v$ into $(v_1, v_2)$ and applies the transformation

$\begin{aligned} u_1 & = v_1 \ u_2 & = v_2 \cdot \exp(s(v_1, y)) + t(v_1, y) \end{aligned}$

where $s(\cdot, y)$ and $t(\cdot, y)$ are neural networks conditioned on $v_1$ and the one-hot encoded label $y$ . The inverse transformation is

$\begin{aligned} v_1 & = u_1 \ v_2 & = [u_2 - t(u_1, y)] \cdot \exp(-s(u_1, y)) \end{aligned}$

The log-Jacobian determinant for a single block is $\sum_i s(v_1, y)_i$ . This parameterization allows tractable density computation and invertibility for both conditional and unconditional components.

4. Marginal Likelihood and Computational Efficiency

The marginal likelihood for unlabeled instances computes as

$p_0(x) = \sum_{y=1}^K p_0(x, y) = \sum_{y=1}^K p_0(y) p_0(x \mid y)$

Given that $f_v(x; w)$ does not depend on $y$ , the flow computation $(z_f, z_{\mathrm{aux}}) = f_v(x; w)$ is executed once. For each possible label $y$ , $z_f$ is passed through $h_0$ to obtain $z_h(y)$ . The marginal log-likelihood thus becomes

$\log p_0(x) = \log \sum_{y=1}^K p_0(y) \exp[ \log \mathcal{N}(z_{\mathrm{aux}} \mid 0, I) + \log |\det J_{f_v}(x)| + \log |\det J_{h_0}(z_f; y)| + \log \mathcal{N}(z_h(y) \mid 0, I) ]$

This log-sum-exp formulation allows for efficient exact computation of both value and gradients, with posterior responsibilities $r_y \propto \exp[A_y]$ facilitating gradient computation with respect to model parameters.

5. Training Objective and Optimization Strategy

SCNF maximizes the exact joint log-likelihood on labeled data and the marginal log-likelihood on unlabeled data: $L(\theta) = \sum_{(x_i, y_i) \in L} \log p_0(x_i, y_i) + \sum_{x_j \in U} \log p_0(x_j)$ For labeled pairs, the calculation follows the full joint density expression, while for unlabeled data, the marginal (log-sum-exp) form is used. Stochastic gradient ascent (e.g., Adam optimizer) is applied directly to this objective. An EM-SGD variant—alternating between computing the class posteriors $q(y)=p_0(y|x)$ and performing a parameter update—yields similar performance. No variational approximations or bounds are required.

On more complex datasets, it is sometimes beneficial to introduce an auxiliary classification loss on $z_f$ to promote label separation, though this was unnecessary for MNIST.

6. Model Architecture and Hyperparameters

The SCNF architecture and associated hyperparameters employ the following components:

Data preprocessing: Inputs (MNIST) are dequantized to $[0, 1]$ , then transformed via $\mathrm{logit}(\alpha + (1 - 2\alpha)x)$ with $\alpha=10^{-6}$ .
Unconditional Flow $f_v(x; w)$ : Multi-scale Glow-style network with three levels of "squeezing", ActNorm, invertible $1 \times 1$ convolutions, and affine-coupling layers. Each coupling layer uses 4-layer residual MLPs (hidden width 64) for $s, t$ .
Conditional Flow $h_0(z_f, y; \phi)$ : Four channel-wise conditional coupling layers, also parameterized by residual MLPs processing the input and one-hot $y$ . The dimension is reduced by factoring out features at two points, but best results used $d_0 = 196$ .
Training: Adam optimizer, learning rate $10^{-4}$ , batch size $100$ (half labeled, half unlabeled), weight decay $0.0$. The model is trained for $100$K iterations per MNIST split.

A summary table of key architecture values from the MNIST setup follows:

Component	Parameterization	Typical Value
Unconditional flow $f_v$	Glow, 3 scales, 4-layer residual MLPs $(s, t)$ , $d_0$	$d_0 = 196$
Conditional flow $h_0$	4 conditional affine-coupling layers, MLPs	$d_0 = 196$
Optimizer	Adam, learning rate, batch size	$10^{-4}$ , $100$

7. Empirical Evaluation and Ablation Findings

Comprehensive empirical analysis demonstrates the effectiveness of SCNF in semi-supervised scenarios:

Toy 2D classification (moons, circles, $1\%$ labeled): SCNF-GLOW achieves $0.3\% - 0.6\%$ test error and NLL $\approx 1.12-1.15$ , significantly outperforming SCNF-GMM ( $1 - 5\%$ error) and unconditional flows.
MNIST (100 labels): Kingma et al.'s VAE achieves $3.3\%$ test error; SCNF-GMM yields $14.2\%$ (insufficient), whereas SCNF-GLOW attains $1.9\% \pm 0.3$ error, bits/dim $1.145 \pm 0.004$ . EM-SGD versus direct SGD yielded identical performance.
Ablation—latent dimension $d_0$ : $d_0=49$ underfits with $61\%$ test error; $d_0=98$ achieves $2.0\%$ ; $d_0=196$ yields optimal $1.9\%$ ; $d_0=392$ or $784$ leads to overfitting.
Data obfuscation/fairness: Classifier on $z_f$ overfits and generalizes poorly; classifier on $z_h$ attains near $100\%$ test accuracy, indicating that $h_0$ removes class information from $z_f$ . t-SNE confirms class separation in $z_f$ and mixing in $z_h$ .

Collectively, these results show that the two-stage semi-conditional flow architecture supports exact joint/marginal likelihood training, efficient inference in semi-supervised settings, and improved classification performance over VAE-based baselines on MNIST (Atanov et al., 2019).

Markdown Report Issue Upgrade to Chat

References (1)

Semi-Conditional Normalizing Flows for Semi-Supervised Learning (2019)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Semi-Conditional Normalizing Flow (SCNF).