Residual Networks Behave Like Boosting Algorithms

Published 25 Sep 2019 in stat.ML and cs.LG | (1909.11790v1)

Abstract: We show that Residual Networks (ResNet) is equivalent to boosting feature representation, without any modification to the underlying ResNet training algorithm. A regret bound based on Online Gradient Boosting theory is proved and suggests that ResNet could achieve Online Gradient Boosting regret bounds through neural network architectural changes with the addition of a shrinkage parameter in the identity skip-connections and using residual modules with max-norm bounds. Through this relation between ResNet and Online Boosting, novel feature representation boosting algorithms can be constructed based on altering residual modules. We demonstrate this through proposing decision tree residual modules to construct a new boosted decision tree algorithm and demonstrating generalization error bounds for both approaches; relaxing constraints within BoostResNet algorithm to allow it to be trained in an out-of-core manner. We evaluate convolution ResNet with and without shrinkage modifications to demonstrate its efficacy, and demonstrate that our online boosted decision tree algorithm is comparable to state-of-the-art offline boosted decision tree algorithms without the drawback of offline approaches.