Efficient and Accurate Gradients for Neural SDEs

Published 27 May 2021 in cs.LG, cs.AI, math.DS, and stat.ML | (2105.13493v3)

Abstract: Neural SDEs combine many of the best qualities of both RNNs and SDEs: memory efficient training, high-capacity function approximation, and strong priors on model space. This makes them a natural choice for modelling many types of temporal dynamics. Training a Neural SDE (either as a VAE or as a GAN) requires backpropagating through an SDE solve. This may be done by solving a backwards-in-time SDE whose solution is the desired parameter gradients. However, this has previously suffered from severe speed and accuracy issues, due to high computational cost and numerical truncation errors. Here, we overcome these issues through several technical innovations. First, we introduce the \textit{reversible Heun method}. This is a new SDE solver that is \textit{algebraically reversible}: eliminating numerical gradient errors, and the first such solver of which we are aware. Moreover it requires half as many function evaluations as comparable solvers, giving up to a $1.98\times$ speedup. Second, we introduce the \textit{Brownian Interval}: a new, fast, memory efficient, and exact way of sampling \textit{and reconstructing} Brownian motion. With this we obtain up to a $10.6\times$ speed improvement over previous techniques, which in contrast are both approximate and relatively slow. Third, when specifically training Neural SDEs as GANs (Kidger et al. 2021), we demonstrate how SDE-GANs may be trained through careful weight clipping and choice of activation function. This reduces computational cost (giving up to a $1.87\times$ speedup) and removes the numerical truncation errors associated with gradient penalty. Altogether, we outperform the state-of-the-art by substantial margins, with respect to training speed, and with respect to classification, prediction, and MMD test metrics. We have contributed implementations of all of our techniques to the torchsde library to help facilitate their adoption.

Abstract PDF Upgrade to Chat

Authors (4)

Citations (50)

View on Semantic Scholar

Summary

The paper presents the reversible Heun method, which accurately computes gradients by aligning the forward and backward passes through algebraic reversibility.
It introduces the Brownian Interval technique that enables exact Brownian motion sampling with constant memory usage and average O(1) time complexity.
It replaces gradient penalties in SDE-GANs with hard Lipschitz constraints, achieving up to 10.6x speed improvements and reducing numerical instabilities.

Efficient and Accurate Gradients for Neural SDEs

The paper introduces several innovations aimed at improving the computation of gradients in Neural Stochastic Differential Equations (Neural SDEs). These advances focus on the elimination of numerical gradient errors and enhancing the speed of training for models that utilize these techniques. The paper particularly highlights the reversible Heun method, Brownian Interval, and efficient SDE-GAN training strategies.

Reversible Heun Method

The reversible Heun method is presented as a novel SDE solver designed to address the challenges in gradient computation through backpropagation in Neural SDEs. This method's key feature is its algebraic reversibility, which ensures precise gradient computation by aligning the forward and backward passes of error calculations.

Benefits and Implementation:

Algebraic Reversibility: Ensures gradients obtained via the continuous adjoint method match exactly with the gradients of the numerically discretized forward pass, overcoming the main limitation of continuous adjoint methods.
Computational Efficiency: Achieves computational gains by reducing the number of function evaluations needed per step, as compared to other solvers like the midpoint or regular Heun's method.
Convergence and Accuracy: Exhibits strong convergence properties and is comparable with other Stratonovich solvers. When noise is constant, the order of convergence improves from 0.5 to 1.
Figure 1: Log-log plot for the strong error estimator $S_N$ computed with $10^7$ Brownian sample paths.

The method has been shown in experiments to effectively speed up training while maintaining or improving model accuracy.

Brownian Interval

This section details the Brownian Interval technique, designed to efficiently sample and reconstruct Brownian motion, critical for SDE solvers.

Key Features:

Memory Efficiency: Maintains constant GPU memory usage despite only requiring a small, fixed-size LRU cache.
Exactness and Speed: Offers exact sampling capabilities with an average case time complexity of $O(1)$ , faster than previous techniques which only offer approximate results.
Implementation: Utilizes a splittable pseudo-random number generator, operating on a dynamic binary tree structure to manage Brownian interval sampling efficiently.

These advances significantly enhance the speed and practicality of deploying Neural SDEs in real-world scenarios, particularly for high-dimensional problems.

Training SDE-GANs without Gradient Penalty

The paper further explores the training of Neural SDEs configured as Generative Adversarial Networks (GANs) and addresses inefficiencies and errors introduced by current gradient penalty methods.

Approach:

Clipping: Replaces gradient penalty with hard Lipschitz constraints through parameter clipping and using LipSwish activation functions, maintaining discriminator Lipschitz constants as required.
Performance Gains: Demonstrated substantial improvements in training speed and model performance metrics, reducing training times and achieving better fit to data without the numerical instabilities caused by penalties.

Experiments and Results

Benchmark Datasets:

The paper evaluates its methodologies against various datasets including time-varying Ornstein–Uhlenbeck samples and air quality time series data, offering:

Performance Metrics: Such as classification accuracy, mean prediction error, and Maximum Mean Discrepancy (MMD) against test datasets.
Training Speed: Gains showed by Brownian Interval and reversible Heun method include up to a 10.6x speed improvement in certain scenarios.
Figure 2: Log-log plots for the weak error estimators computed with $10^7$ Brownian sample paths.

Conclusion

This research presents significant advancements in the efficiency and accuracy of Neural SDEs, particularly through the reversible Heun method, the Brownian Interval for exact sampling, and improved SDE-GAN training techniques. These methodologies have the potential to substantially reduce computational costs and improve the practicability of using Neural SDEs in various domains, such as financial modeling and scientific computations. Future work may explore further enhancements and applications in diverse real-world scenarios.

Markdown Report Issue