torchgfn: A PyTorch GFlowNet library

Published 24 May 2023 in cs.LG | (2305.14594v2)

Abstract: The growing popularity of generative flow networks (GFlowNets or GFNs) from a range of researchers with diverse backgrounds and areas of expertise necessitates a library which facilitates the testing of new features such as training losses that can be easily compared to standard benchmark implementations, or on a set of common environments. torchgfn is a PyTorch library that aims to address this need. It provides users with a simple API for environments and useful abstractions for samplers and losses. Multiple examples are provided, replicating and unifying published results. The code is available in https://github.com/saleml/torchgfn.

Abstract PDF Upgrade to Chat

Citations (4)

View on Semantic Scholar

Summary

The paper introduces a modular PyTorch library that simplifies GFlowNet pipelines by decoupling environmental definitions, sampling, and loss components.
It offers hands-on examples in both discrete and continuous settings, including environments like Hypergrid, DiscreteEBM, and Box.
The toolkit integrates multiple GFlowNet loss formulations, enabling robust comparisons between novel and established methods.

torchgfn: A PyTorch GFlowNet Library

The paper introduces torchgfn, a library designed to streamline research and experimentation with Generative Flow Networks (GFlowNets), implemented using PyTorch. GFlowNets, or GFNs, are probabilistic models that sample sequentially over discrete spaces and possess a unique compositional structure. This library is created to address the increasing demand from researchers across various domains who require a robust framework to test new features, such as training losses, on standard benchmarks or specific environments, without a high overhead in abstracting GFlowNet mechanics.

Library Overview

The primary contribution of this work lies in its simplification of the GFlowNet pipeline. It offers clear modularity by decoupling different components such as environment definitions, sampling processes, and parameterization utilized for GFN losses. This is achieved through a set of APIs designed for ease of use while remaining flexible enough to accommodate both established and novel GFlowNet algorithms.

The library comes bundled with three example environments:

Hypergrid: A straightforward discrete environment where each state is terminal.
DiscreteEBM: Another discrete setting with uniform trajectory lengths but selective terminating states.
Box: A continuous environment where action spaces depend on specific state configurations.

These environments serve dual purposes: providing users with practical examples to deepen their understanding of GFNs and demonstrating how to extend the library's capabilities as per user requirements.

Key Components

Defining an Environment

The torchgfn library emphasizes the flexible definition of environments by requiring users to specify an initial tensor state and either a reward or log-reward function. This setup segregates the environment from the neural network-based learning components, promoting a modular design that can accommodate tweaks and extensions without disrupting core functionality.

States and Actions

States form the fundamental structure upon which GFlowNet operations such as losses navigate. The library provides abstract classes that must be tailored for environment-specific states, encapsulating both initial and terminal states within a Directed Acyclic Graph (DAG) representation. Action classes mirror this setup, handling transitions within such state structures.

Modules and Samplers

Modules, specifically GFNModules, are versatile tools that not only function as neural network approximators but also ensure the conformity of output dimensions essential for the tasks. DiscretePolicyEstimators serve to model forward and backward policy distributions, essential for trajectory evaluation. Samplers complement these by outlining procedures to sample actions based on GFlowNet policies effectively.

Loss Framework

torchgfn integrates multiple existing GFlowNet loss formulations within a unified structure. This flexibility is crucial for researchers aiming to experiment with varied loss landscapes to optimize learning dynamics. The library supports multiple foundational loss types, including flow matching, detailed balance, and trajectory balance, among others.

Conclusion and Future Prospects

torchgfn presents a significant step towards a standardized framework for GFlowNet research. Its modular nature is poised to become an invaluable asset for comparing novel GFlowNet methodologies against established baseline tasks. Future developments are anticipated to enrich the library further with more task reenactments and increasingly intricate real-world environments, catering to broader research needs.

In addition to practical development benefits, torchgfn has the potential to enhance theoretical advancements by providing a shared platform where novel ideations can be tested expeditiously across consistent environments. This standardization could underpin significant advancements in the use of GFlowNets for various applications, including sequential sampling tasks and probabilistic modeling challenges.

In summary, torchgfn is an expertly curated tool for fostering innovation and development within the GFlowNet research community, supporting both existing paradigms and pioneering explorations with equal facility.

Markdown Report Issue