- The paper introduces three new analytic bijections with globally smooth, closed-form inverses that enhance the expressivity and tractability of normalizing flows.
- It develops radial flow architectures that directly parameterize the radial coordinate, resulting in notable improvements in training stability, interpretability, and parameter efficiency.
- Empirical evaluations demonstrate superior likelihood performance and effective mode collapse mitigation, particularly in challenging applications like lattice field theory.
Analytic Bijections and Radial Architectures for Smooth Normalizing Flows
Introduction and Motivation
Normalizing flows (NFs) are explicit density models constructed via invertible mappings from a simple base distribution to a complex target, leveraging the change-of-variables formula for exact log-likelihood optimization. A primary architectural bottleneck in classical flows lies in scalar bijections—the functions used to transform individual coordinates or groups within coupling or autoregressive structures. Existing scalar bijection classes, such as affine, monotonic splines, and neural-based residual flows, come with fundamental trade-offs: smoothness vs. expressivity, domain coverage vs. invertibility in closed form, and computational tractability vs. local control.
This work, "Analytic Bijections for Smooth and Interpretable Normalizing Flows" (2601.10774), advances the landscape by introducing three new analytic bijection families with globally smooth (C∞), unbounded domain, tractable Jacobians, and closed-form inverses. Further, these bijections enable both local and global transformations of probability mass. The work also innovates in architecture by developing "radial flows": models that transform the radial coordinate (possibly angle-dependent) in direct parameterizations—leading to improved stability, interpretability, and parameter efficiency relative to standard coupling-based flows.
Analytic Bijection Construction
The bijections proposed fall into three classes:
- Cubic Rational: Localized transformation via a rational function with cubic numerator and quadratic denominator, constraining the perturbation to be local with h(x)→x as ∣x∣→∞:
h(x)=x+1+σ2(x−γ)2λ(x−γ)
Subject to −1<λ<8 and σ>0 to ensure strict monotonicity and invertibility (by cubic root).
- Sinh (Conjugation): Global shift and local deformation via conjugation with the strictly monotonic sinh function:
h(x)=σarcsinh(eμ(eνsinh(σx−γ)+δ))+γ
Parameters (δ,μ,ν) enable flexible control over local and global distribution transport.
- Cubic Polynomial Conjugation: Localized deformation via conjugation with g(x)=ax+bx3:
h(x)=g−1(g(x−γ)+δ)+γ
This class allows for arbitrarily local polynomial deformations with tractable inversion using the analytic solution for cubic roots.
Desiderata and Properties
Compared with established methods, all presented bijections are globally smooth, defined on R, possess analytic inverses and allow both local (e.g., local "bumps" in the density) and global modifications (e.g., shifting mass to distant regions without slope changes asymptotically).
Figure 1: The three bijection classes cover local deformation (cubic rational), global shift (sinh conjugation), and uniform scaling (affine) regimes, as shown by their effect on transforming a standard normal density.
Parametric stability is achieved by enforcing soft constraints through reparameterizations, such as softplus activation for scale parameters and bounded sigmoid for constrained intervals, ensuring models remain well within the injectivity region during training.
Flow Architectures: Coupling, Radial, and Angular
Coupling Flows
Coupling layers, where a subset of coordinates are transformed conditioned on the remainder, are a standard for high-dimensional flows. The analytic bijections can directly replace affine or spline transformations, augmenting expressivity, smoothness, and offering strictly analytic inverses in all cases. Empirically, increasing the stack depth of bijections within coupling layers monotonically improves likelihood performance, with cubic-based bijections often outperforming splines and affine for the same model class and parameter count.

Figure 2: Effective Sample Size improves steadily as the number of stacked analytic bijections increases within the flow.
Figure 3: Training dynamics for 1D flows show consistent, stable convergence with increasing bijection stack depth, confirming the optimization amenability of the analytic recursions.
Radial Flows
Novel to this work is the direct parameterization in polar coordinates: radial flows transform vectors by modifying r=∥x∥, preserving direction. The scalar radial transformation may be purely radial or also angle-dependent (i.e., f(r,x^)), with parameters directly learned (not predicted by a conditioner NN). The key advantages include:
- Training Stability: Radial flows allow order-of-magnitude higher learning rates due to non-crossing of paths and implicit regularization via geometric constraints.
- Interpretability: The learned transformation on r and the location of focus centers are directly inspectable, enabling geometric reasoning about behavior.
- Parameter Efficiency: For distributions with strong radial structure, high fidelity is achieved with two to three orders of magnitude fewer parameters than coupling flows.
Figure 4: Radial flows preserve geometric source-target associations, maintaining the radial structure, in contrast to coupling flows which result in substantial mixing.
Figure 5: Log-likelihood evolution visualizes rapid convergence to smooth spiral structure for the radial flow, with fewer training artifacts compared to coupling.
Angular Dependence and Fourier Flows
The authors further extend radial flows with angle-dependent transformations. In 2D, this is naturally accomplished via a learned Fourier expansion in the angular coordinate. Even with very few parameters (as low as 49 for angle-independent, up to 319 for several modes), radial Fourier flows capture complex non-linear, non-convex density geometry, ensuring high interpretability—a stark contrast to deep neural conditioners used in coupling flows.
Figure 6: Fourier-parameterized radial flows efficiently express complex angular structure and spiral detail with limited angular modes.
Quantitative and Qualitative Results
1D and 2D Density Estimation
Bijections are benchmarked in 1D (oscillating multimodal) and 2D (spiral and mixture of Gaussians) synthetic targets. Key findings include:
- Monotonic improvement in reverse and forward KL with number of bijection stack layers (Figure 2).
- Analytic bijections in coupling flows consistently exceed affine and monotonic spline benchmarks on the 2D spiral (Figure 7).
- Radial flows, given sufficient centers, closely match multimodal GMM targets with orders of magnitude fewer parameters, demonstrating significant parameter efficiency in structured targets (Figure 8).
- Artifacts introduced by coupling flows are notably absent in radial flows, with the latter maintaining globally smooth densities (Figures 5, 7).
Training Dynamics and Hyperparameter Effects
Ablation studies reveal that increasing centers in radial flows or stacking additional bijections in all settings yields consistent log-likelihood improvement (Figure 9). However, excessively deep stacks without careful parameter initialization and suppression can induce optimization instability.
Figure 9: Forward KL versus number of centers and bijection stacks in radial flows confirming monotonic improvement before reaching the optimization–expressivity trade-off frontier.
Application to Lattice Field Theory
The analytic bijections are further incorporated in coupling flows for 2D ϕ4 lattice field theory (400D). Two regimes are examined:
- Unimodal regime: Analytic bijections augment standard coupling flows, yielding 10% higher ESS than affine-only baselines.
- Bimodal regime and Mode Collapse Mitigation: A problem-specific architecture is introduced—a symmetric zero-mode bijection (applied on spatial field mean), pretrained to maintain Z2-symmetry and then integrated with deep coupling flows. This design precludes mode collapse and captures both modes, in contrast to standard coupling flows that collapse to a single mode despite high apparent ESS.

Figure 10: ESS and magnetization symmetry, demonstrating the efficacy of zero-mode pretraining in preventing mode collapse in the bimodal regime.
Figure 11: Visual field samples before and after pretraining confirm successful symmetric mode support with the problem-specific architecture.
Implications and Future Directions
This work demonstrates that the space of tractable and expressive scalar bijections for normalizing flows is richer than previously utilized. The proposed families fill critical gaps left by affine, spline, and neural residual alternatives. They enable coupling and autoregressive layers with smooth invertible maps, efficient parameter usage, and improved optimization performance.
Radial and angular flows introduce new design paradigms—especially relevant in low to intermediate dimensions and for targets with strong geometric or rotational structure. The exceptional interpretability of direct parameterizations (notably under Fourier expansions) offers an attractive model class for scientific and physical applications where transparency is at a premium.
Prospective Developments
Hybridization of radial and coupling flows, use of analytic bijections in autoregressive frameworks, and extension of angular parameterizations to higher-dimensional spherical harmonics are promising avenues. In high dimensions, radial flows may be best used as interpretable components layered with coupling blocks, leveraging their stability for coarse geometric alignment and delegating fine detail to coupling compositions.
Conclusion
This paper advances the theoretical and practical state-of-the-art in normalizing flows via the introduction of analytic, closed-form, and highly-expressive bijection families, as well as direct-parameterization flow architectures for stable, interpretable, and efficient density estimation. These contributions have direct implications for physics-inspired density modelling, generative learning, and any application demanding smooth, tractable transformations on Rd.