Adaptive Stochastic Variance Reduction for Subsampled Newton Method with Cubic Regularization

Published 28 Nov 2018 in math.OC | (1811.11637v1)

Abstract: The cubic regularized Newton method of Nesterov and Polyak has become increasingly popular for non-convex optimization because of its capability of finding an approximate local solution with second-order guarantee. Several recent works extended this method to the setting of minimizing the average of N smooth functions by replacing the exact gradients and Hessians with subsampled approximations. It has been shown that the total Hessian sample complexity can be reduced to be sublinear in N per iteration by leveraging stochastic variance reduction techniques. We present an adaptive variance reduction scheme for subsampled Newton method with cubic regularization, and show that the expected Hessian sample complexity is O(N + N^{{2/3}\epsilon^{-3/2})} for finding an (\epsilon,\epsilon^{{1/2})-approximate} local solution (in terms of first and second-order guarantees respectively). Moreover, we show that the same Hessian sample complexity retains with fixed sample sizes if exact gradients are used. The techniques of our analysis are different from previous works in that we do not rely on high probability bounds based on matrix concentration inequalities. Instead, we derive and utilize bounds on the 3rd and 4th order moments of the average of random matrices, which are of independent interest on their own.