Papers
Topics
Authors
Recent
Search
2000 character limit reached

Generative FlowNet Objectives

Updated 9 February 2026
  • Generative Flow Networks (GFlowNets) are frameworks defined on state-action graphs to generate compositional objects with terminal distributions proportional to unnormalized rewards.
  • They optimize objectives like flow matching, detailed balance, and trajectory balance to enforce global and local flow consistency in the generative process.
  • These objectives extend to continuous and hybrid settings, offering flexible credit assignment and tuning of exploration-exploitation trade-offs in complex probabilistic models.

Generative Flow Network (GFlowNet) Objectives

Generative Flow Networks (GFlowNets) are frameworks for learning stochastic policies that generate complex compositional objects via sequential steps, ensuring that the distribution over final objects (terminal states) is proportional to a prespecified unnormalized reward. GFlowNets constitute a distinct family of amortized variational inference algorithms, generative models, and exploration mechanisms, bridging ideas from energy-based models, Markov chain Monte Carlo, and reinforcement learning. The training objectives for GFlowNets formalize and optimize constraints on "flows" through a state-action graph, ensuring correct marginalization to the reward-matching distribution. This article provides an in-depth exposition of the principal GFlowNet objectives—including both their theoretical foundation and their variants for practical training and control—across discrete, continuous, and hybrid settings.

1. Foundations: Flows, Policies, and the GFlowNet Objective Family

A GFlowNet is defined on a directed acyclic graph (DAG) or more generally a "measurable pointed graph" G=(Sˉ,Σ,s0,,κ,κb,ν)G = (\bar{\mathcal S}, \Sigma, s_0, \bot, \kappa, \kappa^b, \nu) for generalized settings (Lahlou et al., 2023). States ss are partial or complete objects, with s0s_0 denoting the unique source (root), and terminal (sink) states X{\mathcal X} assigned unnormalized reward R(x)0R(x)\geq 0. Flows are nonnegative measures FF over nodes, edges, or trajectories, encoding step-by-step generative processes.

A Markovian (forward) policy PF(ss)P_F(s'|s) is defined by normalized edge-flow: PF(ss)=F(ss)/F(s)P_F(s'|s) = F(s\to s') / F(s). Sometimes, a backward policy PB(ss)P_B(s|s') parameterizes the reverse process. The principal objective is to construct a PFP_F so that the resulting marginal distribution over terminals xx satisfies PF(x)R(x)P_F^\top(x)\propto R(x). This is achieved by enforcing global and local consistency constraints on flow, leading to several mathematically equivalent—but practically distinct—objective functions.

GFlowNet objectives and their loss functions can be instantiated at varying levels: per-state (flow-matching), per-edge (detailed-balance), or per-trajectory (trajectory-balance). The architectural and algorithmic choices (discrete or continuous; local or global; degree of regularization; nature of loss metric) shape the credit assignment, exploration-exploitation tradeoffs, and empirical scalability (Bengio et al., 2021, Lahlou et al., 2023, Hu et al., 2024).

2. Principal Objectives: Flow Matching, Detailed-Balance, and Trajectory-Balance

The three primary GFlowNet objective classes, each of which yields the reward-matching distribution at optimum, are as follows.

Flow Matching (FM)

The flow-matching constraint enforces conservation of probability mass at every non-terminal state. In discrete settings, for every sX{s0}s \notin {\mathcal X} \cup \{s_0\},

tF(ts)=F(s)=uF(su)\sum_{t} F(t \to s) = F(s) = \sum_{u} F(s \to u)

where the in-flow equals out-flow. The standard loss is a mean squared error (MSE) surrogate, often written in log-space for numerical stability: LFM=s(logtF(ts)log[R(s)1sX+uF(su)])2L_{\text{FM}} = \sum_{s} \Bigl(\log \sum_{t} F(t \to s) - \log [R(s)\mathbf{1}_{s \in \mathcal X} + \sum_{u} F(s \to u)]\Bigr)^2 For continuous/hybrid spaces, integrals over flows replace sums, and the loss takes the form (Lahlou et al., 2023): LFM(s;θ)=[logu(s;θ)pF(s,s;θ)κb(s,ds)u(s;θ)]2L_{FM}(s';\theta) = \left[\log \frac{\int u(s;\theta)p_F(s,s';\theta)\kappa^b(s',ds)}{u(s';\theta)}\right]^2

Detailed Balance (DB)

Detailed balance imposes pairwise consistency between forward and backward transitions at each edge: F(s)PF(ss)=F(s)PB(ss)F(s) P_F(s'|s) = F(s') P_B(s|s') The squared log-ratio loss is

LDB(s,s)=[logF(s)+logPF(ss)logF(s)logPB(ss)]2L_{DB}(s,s') = \left[\log F(s) + \log P_F(s'|s) - \log F(s') - \log P_B(s|s') \right]^2

In continuous/hybrid settings, densities appear in the numerator and denominator accordingly (Lahlou et al., 2023).

Trajectory-Balance (TB)

Trajectory-balance globally enforces that the forward path probability, scaled by a learned normalizer ZZ, matches the trajectory's reward-weighted backward probability: Zt=0n1PF(st+1st)=R(sn)t=0n1PB(stst+1)Z \prod_{t=0}^{n-1} P_F(s_{t+1}|s_t) = R(s_n) \prod_{t=0}^{n-1} P_B(s_t|s_{t+1}) with empirical loss

LTB(τ)=[logZ+t=0n1logPF(st+1st)logR(sn)t=0n1logPB(stst+1)]2L_{TB}(\tau) = \left[\log Z + \sum_{t=0}^{n-1} \log P_F(s_{t+1}|s_t) - \log R(s_n) - \sum_{t=0}^{n-1} \log P_B(s_t|s_{t+1})\right]^2

TB loss admits efficient credit assignment along sampled paths and is widely used in regimes with long or combinatorial trajectories (Zimmermann et al., 2022, Shen et al., 2023). Subtrajectory-balance (SubTB) further allows for partial path constraints, interpolating between DB and TB (Pan et al., 2023, Chen et al., 2 Feb 2026).

3. Generalization to Continuous, Hybrid, and Local-Credit Settings

Historically, GFlowNet objectives were formulated on discrete state spaces. The theory for continuous or hybrid (mixed discrete/continuous) state-action spaces is developed by Lahlou et al. (Lahlou et al., 2023) and in "CFlowNets" (Li et al., 2023), leveraging measurable graph structures, σ-finite kernels, and Radon–Nikodym densities.

In these settings, flows are measures (not just vectors), and constraints are expressed as integral equations. The FM, DB, and TB losses above generalize via integrals, with forward and backward reference kernels blending Lebesgue and counting measures as appropriate.

Further, "forward-looking" or local-credit GFlowNet objectives leverage intermediate state energies, allowing training from incomplete trajectories by reweighting state flows with accrued energy and modifying the standard DB/TB equations to include edgewise energy differences (Pan et al., 2023). This increases training signal density, improves credit assignment, and accelerates convergence, especially in sparse-reward or long-horizon problems.

4. Variational, Regularized, and Generalized Loss Functions

Recent theoretical analyses have shown that GFlowNet objectives admit a variational reinterpretation as divergence minimization between forward and backward path distributions (Zimmermann et al., 2022, Hu et al., 2024). The standard quadratic/log-space loss corresponds to minimizing the reverse Kullback-Leibler (KL) divergence; alternative regression losses instantiate different ff-divergences.

Specifically, the general loss form is

L=oOμ(o)g(logp^B(o)p^F(o))\mathcal L = \sum_{o \in \mathcal O} \mu(o) \, g\left(\log \frac{\hat p_B(o)}{\hat p_F(o)}\right)

where gg is the chosen regression function and p^B(o),p^F(o)\hat p_B(o), \hat p_F(o) are backward/forward flows on object oo (state, edge, subtrajectory).

The recent taxonomy of losses (Hu et al., 2024) is summarized as:

Name g(t)g(t) Zero-Forcing Zero-Avoiding
Quadratic 12t2\frac{1}{2} t^2
Linex(1) ett1e^{t}-t-1
Linex(1/2) 4et/22t44 e^{t/2}-2t-4
Shifted-Cosh et+et2e^t + e^{-t} - 2

Zero-forcing losses (e.g., quadratic, shifted-cosh) promote 'exploitation'—concentration on high-reward mass and avoidance of spurious support. Zero-avoiding losses (Linex(1), shifted-cosh) encourage 'exploration,' maintaining nonzero probability wherever the backward measure is positive. Selection of loss thus governs the exploration-exploitation, convergence, and diversity profiles of the GFlowNet (Hu et al., 2024).

The variational viewpoint also clarifies the connection of TB under forward sampling to reverse KL minimization, and FKL/RKL convex combinations interpolate between mode-covering, mode-seeking, and hybrid credit assignment (Zimmermann et al., 2022).

5. Markov Chain and RL Perspectives: Exploration-Exploitation Control

Standard GFlowNet objectives can be interpreted as enforcing reversibility of a Markov chain formed by an equal mixture of PFP_F and PBP_B (Chen et al., 2 Feb 2026). This perspective enables formalization of the exploration-exploitation trade-off via the introduction of an α\alpha-parameterized mixture: Pα(ss)=αPF(ss)+(1α)PB(ss)P_\alpha(s'|s) = \alpha P_F(s'|s) + (1-\alpha) P_B(s'|s) The α\alpha-GFlowNet (α\alpha-GFN) objectives generalize all standard losses (for α=0.5\alpha=0.5) and allow direct tuning of exploration (α<0.5\alpha<0.5) or exploitation (α>0.5\alpha>0.5). The corresponding loss is: LαDB(s,s)=[logαF(s)PF(ss)(1α)F(s)PB(ss)]2L_{\alpha-\text{DB}}(s,s') = \left[\log \frac{\alpha F(s)P_F(s'|s)}{(1-\alpha)F(s')P_B(s|s')}\right]^2 This construction is strictly justified: for any fixed α(0,1)\alpha \in (0,1), the optimized flow is unique (up to normalization), and convergence is guaranteed by irreducibility/recurrence of PαP_\alpha (Chen et al., 2 Feb 2026). Empirical results show that staged schedules, with α\alpha annealed toward $0.5$ after initial exploratory or exploitative regimes, vastly increase mode discovery and the diversity of learned distributions.

Furthermore, there is a deep equivalence between GFlowNet training and entropy-regularized RL: standard GFlowNet objectives correspond to soft Bellman consistency equations, with the entropy temperature α\alpha controlling distributional spread and diversity (Tiapkin et al., 2023). Parameter settings with α=1\alpha=1 recover the canonical GFlowNet objectives, while varying α\alpha interpolates toward RL's deterministic objectives.

6. Training Design: Credit Assignment, Parameterization, and Extensions

GFlowNet objectives can be viewed as regression tasks between forward and backward flows for various objects (state, edge, trajectory), with loss structure dictating the granularity and propagation of credit:

Objective Granularity Credit Assignment Complexity/Features
FM per-state Local edges Needs parent/child sums
DB per-edge Edgewise Local, supports parallelism
TB per-trajectory Global, sampled paths One loss per trajectory
SubTB subtrajectory Windowed local-global Interpolates FM/TB
Guided TB per-trajectory Guided, structured paths Corrects under-credit regions

Relative edge-flow parameterization, where each action's probability depends on both parent and child, improves generalization and accelerates convergence (Shen et al., 2023). The guided trajectory-balance (GTB) objective leverages designer-specified or learned non-Markovian trajectory guides to address under-crediting of infrequent substructures, particularly for complex compositional domains such as molecular design (Shen et al., 2023).

Key explorations include prioritized replay, maximum-entropy backward kernels, mixed forward/backward sampling, and explicit variance reduction via control variates and leave-one-out baselining (Zimmermann et al., 2022, Shen et al., 2023).

7. Theoretical Guarantees and Empirical Validation

Satisfaction of any of the principal objectives (FM, DB, TB, or their continuous/generalized analogs) almost everywhere implies that the GFlowNet forward policy asymptotically draws samples from the desired normalized reward distribution (Lahlou et al., 2023, Bengio et al., 2021, Bengio et al., 2021). Trajectory-level objectives yield lower-variance, globally coherent signal, while local objectives are practical when state degrees are moderate.

Recent empirical ablation and benchmark studies confirm that the choice of objective (and particularly its loss function design) directly impacts exploration, sample diversity, rate of mode discovery, reward concentration, and robustness to under-sampled or rare-support regions (Hu et al., 2024, Shen et al., 2023, Chen et al., 2 Feb 2026). The modularity of GFlowNet objectives—together with policy parameterization, guide function specification, and loss family choice—makes them a powerful toolkit for diverse probabilistic modeling and generative design tasks.


References: (Lahlou et al., 2023, Tiapkin et al., 2023, Zimmermann et al., 2022, Pan et al., 2023, Bengio et al., 2021, Zhang et al., 2022, Li et al., 2023, Chen et al., 2 Feb 2026, Bengio et al., 2021, Hu et al., 2024, Shen et al., 2023)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Generative Flow Network (GFlowNet) Objectives.