Papers
Topics
Authors
Recent
Search
2000 character limit reached

Semi-Conditional Normalizing Flow (SCNF)

Updated 14 December 2025
  • SCNF is a semi-supervised model that explicitly defines the joint distribution over inputs and labels using a two-stage (unconditional and conditional) flow architecture.
  • It leverages conditional affine coupling layers to enable tractable density computation and efficient marginal likelihood estimation via a log-sum-exp formulation.
  • Empirical results on benchmarks like MNIST demonstrate SCNF’s state-of-the-art performance with low error rates, validating its design and optimization strategy.

Semi-Conditional Normalizing Flow (SCNF) is a class of normalizing flow models designed for semi-supervised learning through explicit modeling of the joint distribution over inputs and discrete labels. By employing a two-stage (semi-conditional) flow architecture—comprising an unconditional flow followed by a conditional component—SCNF efficiently leverages both labeled and unlabeled data. The architecture enables efficient computation of marginal likelihoods, supports principled parameter learning using exact joint and marginal maximum likelihood, and yields state-of-the-art performance in semi-supervised settings on canonical benchmarks (Atanov et al., 2019).

1. Joint Density Model and Decomposition

SCNF constructs an explicit model of the joint distribution p0(x,y)p_0(x, y) for input data xRdx \in \mathbb{R}^d and discrete labels y{1,,K}y \in \{1, \ldots, K\},

p0(x,y)=p0(y)p0(xy)p_0(x, y) = p_0(y) \, p_0(x \mid y)

where p0(y)=1/Kp_0(y)=1/K (uniform prior), and p0(xy)p_0(x \mid y) is defined by a normalizing flow with a latent Gaussian base. Introducing an invertible mapping f0(x;y,θ)f_0(x; y, \theta) yields, via change of variables,

logp0(xy)=logpz(z;y)+logdetf0(x;y)/x\log p_0(x \mid y) = \log p_z(z; y) + \log |\det \partial f_0(x; y)/\partial x|

with pz(z;y)=N(z0,I)p_z(z; y) = \mathcal{N}(z \mid 0, I). The joint log-density thus decomposes as

logp0(x,y)=logp0(y)+logN(z0,I)+logdetJf0(x;y)\log p_0(x, y) = \log p_0(y) + \log \mathcal{N}(z \mid 0, I) + \log |\det J_{f_0}(x; y)|

This explicit formulation allows the model to maximize both joint and marginal likelihoods in a unified semi-supervised learning objective.

2. Semi-Conditional Architecture and Mapping Structure

SCNF divides the invertible mapping into two cascaded components:

  1. Unconditional flow fv(x;w)f_v(x; w): Maps input xx to a semantic latent zfRd0z_f \in \mathbb{R}^{d_0} and auxiliary latent zauxRd1z_\mathrm{aux} \in \mathbb{R}^{d_1}, with d0+d1=dd_0 + d_1 = d.
  2. Conditional flow h0(zf,y;ϕ)h_0(z_f, y; \phi): Maps zfz_f to zhz_h and conditions explicitly on yy.

Formally,

(zf,zaux)=fv(x;w)(z_f, z_\mathrm{aux}) = f_v(x; w)

zh=h0(zf,y;ϕ)z_h = h_0(z_f, y; \phi)

The Jacobian determinant for the composition factorizes as

detJ(hf)(x;y)=detJh(zf;y)detJf(x)|\det J_{(h \circ f)}(x; y)| = |\det J_{h}(z_f; y)| \cdot |\det J_{f}(x)|

This two-stage structure underpins efficient marginalization over classes, with fvf_v computed once per input and h0h_0 applied KK times (once for each label class).

3. Conditional Affine Coupling Layers

The conditional flow h0h_0 and portions of the unconditional flow fvf_v are constructed from conditional affine-coupling blocks. Each block partitions the input vv into (v1,v2)(v_1, v_2) and applies the transformation

u1=v1 u2=v2exp(s(v1,y))+t(v1,y)\begin{aligned} u_1 & = v_1 \ u_2 & = v_2 \cdot \exp(s(v_1, y)) + t(v_1, y) \end{aligned}

where s(,y)s(\cdot, y) and t(,y)t(\cdot, y) are neural networks conditioned on v1v_1 and the one-hot encoded label yy. The inverse transformation is

v1=u1 v2=[u2t(u1,y)]exp(s(u1,y))\begin{aligned} v_1 & = u_1 \ v_2 & = [u_2 - t(u_1, y)] \cdot \exp(-s(u_1, y)) \end{aligned}

The log-Jacobian determinant for a single block is is(v1,y)i\sum_i s(v_1, y)_i. This parameterization allows tractable density computation and invertibility for both conditional and unconditional components.

4. Marginal Likelihood and Computational Efficiency

The marginal likelihood for unlabeled instances computes as

p0(x)=y=1Kp0(x,y)=y=1Kp0(y)p0(xy)p_0(x) = \sum_{y=1}^K p_0(x, y) = \sum_{y=1}^K p_0(y) p_0(x \mid y)

Given that fv(x;w)f_v(x; w) does not depend on yy, the flow computation (zf,zaux)=fv(x;w)(z_f, z_{\mathrm{aux}}) = f_v(x; w) is executed once. For each possible label yy, zfz_f is passed through h0h_0 to obtain zh(y)z_h(y). The marginal log-likelihood thus becomes

logp0(x)=logy=1Kp0(y)exp[logN(zaux0,I)+logdetJfv(x)+logdetJh0(zf;y)+logN(zh(y)0,I)]\log p_0(x) = \log \sum_{y=1}^K p_0(y) \exp[ \log \mathcal{N}(z_{\mathrm{aux}} \mid 0, I) + \log |\det J_{f_v}(x)| + \log |\det J_{h_0}(z_f; y)| + \log \mathcal{N}(z_h(y) \mid 0, I) ]

This log-sum-exp formulation allows for efficient exact computation of both value and gradients, with posterior responsibilities ryexp[Ay]r_y \propto \exp[A_y] facilitating gradient computation with respect to model parameters.

5. Training Objective and Optimization Strategy

SCNF maximizes the exact joint log-likelihood on labeled data and the marginal log-likelihood on unlabeled data: L(θ)=(xi,yi)Llogp0(xi,yi)+xjUlogp0(xj)L(\theta) = \sum_{(x_i, y_i) \in L} \log p_0(x_i, y_i) + \sum_{x_j \in U} \log p_0(x_j) For labeled pairs, the calculation follows the full joint density expression, while for unlabeled data, the marginal (log-sum-exp) form is used. Stochastic gradient ascent (e.g., Adam optimizer) is applied directly to this objective. An EM-SGD variant—alternating between computing the class posteriors q(y)=p0(yx)q(y)=p_0(y|x) and performing a parameter update—yields similar performance. No variational approximations or bounds are required.

On more complex datasets, it is sometimes beneficial to introduce an auxiliary classification loss on zfz_f to promote label separation, though this was unnecessary for MNIST.

6. Model Architecture and Hyperparameters

The SCNF architecture and associated hyperparameters employ the following components:

  • Data preprocessing: Inputs (MNIST) are dequantized to [0,1][0, 1], then transformed via logit(α+(12α)x)\mathrm{logit}(\alpha + (1 - 2\alpha)x) with α=106\alpha=10^{-6}.
  • Unconditional Flow fv(x;w)f_v(x; w): Multi-scale Glow-style network with three levels of "squeezing", ActNorm, invertible 1×11 \times 1 convolutions, and affine-coupling layers. Each coupling layer uses 4-layer residual MLPs (hidden width 64) for s,ts, t.
  • Conditional Flow h0(zf,y;ϕ)h_0(z_f, y; \phi): Four channel-wise conditional coupling layers, also parameterized by residual MLPs processing the input and one-hot yy. The dimension is reduced by factoring out features at two points, but best results used d0=196d_0 = 196.
  • Training: Adam optimizer, learning rate 10410^{-4}, batch size $100$ (half labeled, half unlabeled), weight decay $0.0$. The model is trained for $100$K iterations per MNIST split.

A summary table of key architecture values from the MNIST setup follows:

Component Parameterization Typical Value
Unconditional flow fvf_v Glow, 3 scales, 4-layer residual MLPs (s,t)(s, t), d0d_0 d0=196d_0 = 196
Conditional flow h0h_0 4 conditional affine-coupling layers, MLPs d0=196d_0 = 196
Optimizer Adam, learning rate, batch size 10410^{-4}, $100$

7. Empirical Evaluation and Ablation Findings

Comprehensive empirical analysis demonstrates the effectiveness of SCNF in semi-supervised scenarios:

  • Toy 2D classification (moons, circles, 1%1\% labeled): SCNF-GLOW achieves 0.3%0.6%0.3\% - 0.6\% test error and NLL 1.121.15\approx 1.12-1.15, significantly outperforming SCNF-GMM (15%1 - 5\% error) and unconditional flows.
  • MNIST (100 labels): Kingma et al.'s VAE achieves 3.3%3.3\% test error; SCNF-GMM yields 14.2%14.2\% (insufficient), whereas SCNF-GLOW attains 1.9%±0.31.9\% \pm 0.3 error, bits/dim 1.145±0.0041.145 \pm 0.004. EM-SGD versus direct SGD yielded identical performance.
  • Ablation—latent dimension d0d_0: d0=49d_0=49 underfits with 61%61\% test error; d0=98d_0=98 achieves 2.0%2.0\%; d0=196d_0=196 yields optimal 1.9%1.9\%; d0=392d_0=392 or $784$ leads to overfitting.
  • Data obfuscation/fairness: Classifier on zfz_f overfits and generalizes poorly; classifier on zhz_h attains near 100%100\% test accuracy, indicating that h0h_0 removes class information from zfz_f. t-SNE confirms class separation in zfz_f and mixing in zhz_h.

Collectively, these results show that the two-stage semi-conditional flow architecture supports exact joint/marginal likelihood training, efficient inference in semi-supervised settings, and improved classification performance over VAE-based baselines on MNIST (Atanov et al., 2019).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Semi-Conditional Normalizing Flow (SCNF).