Robust Differentiable SVD

Published 8 Apr 2021 in cs.CV | (2104.03821v1)

Abstract: Eigendecomposition of symmetric matrices is at the heart of many computer vision algorithms. However, the derivatives of the eigenvectors tend to be numerically unstable, whether using the SVD to compute them analytically or using the Power Iteration (PI) method to approximate them. This instability arises in the presence of eigenvalues that are close to each other. This makes integrating eigendecomposition into deep networks difficult and often results in poor convergence, particularly when dealing with large matrices. While this can be mitigated by partitioning the data into small arbitrary groups, doing so has no theoretical basis and makes it impossible to exploit the full power of eigendecomposition. In previous work, we mitigated this using SVD during the forward pass and PI to compute the gradients during the backward pass. However, the iterative deflation procedure required to compute multiple eigenvectors using PI tends to accumulate errors and yield inaccurate gradients. Here, we show that the Taylor expansion of the SVD gradient is theoretically equivalent to the gradient obtained using PI without relying in practice on an iterative process and thus yields more accurate gradients. We demonstrate the benefits of this increased accuracy for image classification and style transfer.

Abstract PDF Upgrade to Chat

Citations (26)

View on Semantic Scholar

Summary

The paper introduces a robust differentiable SVD method that uses Taylor expansion for stable eigenvector gradient computation.
It eliminates iterative error accumulation compared to Power Iteration, resulting in faster and more accurate deep network training.
Practical experiments in image classification and style transfer demonstrate enhanced feature decorrelation and visual quality.

Robust Differentiable SVD

The paper "Robust Differentiable SVD" introduces an approach to enhancing the numerical stability of differentiable Singular Value Decomposition (SVD), specifically for eigenvector computations, which are commonly used in computer vision tasks like image classification and style transfer. The authors address numerical instability issues that arise in deep networks when eigenvalues are close to each other, impacting the convergence and accuracy of such models.

Methodology

Taylor Expansion for SVD

The key innovation is leveraging the Taylor expansion of the SVD gradient, which offers a more stable alternative to both direct analytical derivatives and Power Iteration (PI) methods. The Taylor expansion of the gradient avoids the iterative deflation process required by PI and reduces accumulated errors, enhancing the gradient accuracy.

Consider the partial derivatives expressed as:

$\frac{\partial L}{\partial M} = V\left( \left( \widetilde{K}^{\top} \circ \left(V^{\top}\frac{\partial L}{\partial V} \right) \right)+\left( \frac{\partial L}{\partial \Lambda}_{diag} \right) \right) V^{\top}$

where $\widetilde{K}_{i,j} = \frac{1}{\lambda_i - \lambda_j}$ , which becomes problematic when $\lambda_i \approx \lambda_j$ . Applying Taylor expansion to $\widetilde{K}$ stabilizes the computation when eigenvalue differences are small.

Comparison with Power Iteration

The Taylor series method is shown to be theoretically equivalent to using PI for gradient computation up to a certain degree but avoids iterative processing, which leads to faster computations and reduced numerical errors.

Figure 1: Original gradient descent direction and the ones after gradient clipping and Taylor expansion. We observe that the direction is better preserved with the Taylor expansion.

Practical Applications and Experiments

Image Classification

Utilizing decorrelated batch normalization through ZCA whitening, the authors demonstrate superior performance and stability over previous methods and baselines in CIFAR and ImageNet datasets. The enhanced gradient leads to better convergence and overall accuracy.

Table below summarizes results from CIFAR-10:

Method	d=4	d=8	d=16	d=32	d=64
SVD-PI	100%	100%	100%	100%	100%
SVD-Taylor	100%	100%	100%	100%	100%
SVD-Clip	100%	100%	100%	100%	100%

Style Transfer

For image style transfer, the paper proposes using the novel SVD method for whitening and coloring transformations—key operations in transferring styles effectively. The experiments illustrate sharper and more color-accurate style transfer results.

Figure 2: Qualitative comparisons on the Artworks dataset. Images generated with SVD result in better detail preservation.

Conclusion

The proposed method provides a robust framework to integrate SVD into deep learning models, overcoming challenges of numerical instability with minimal computational overhead. Its efficacy in tasks requiring accurate eigendecomposition, such as feature decorrelation and style transfer, paves the way for further research into improving the speed of forward-pass computations, potentially broadening the application scope of differentiable SVD.

Markdown Report Issue