SVSL improves NCC mismatch versus vanilla cross-entropy
Show that, with suitable hyperparameters α and γ, optimizing the Stochastic Variability-Simplification Loss—defined as cross-entropy augmented by a penalty over layers j≥γ of the squared distance between each sample’s activation g^{(j)}(x_i) and its batch class-mean—reduces the NCC mismatch at intermediate layers compared to training with vanilla cross-entropy, for both training and test sets during the Terminal Phase of Training.
References
Conjecture[SVSL improves NCC mismatch] Using the properly defined hyperparameters α, γ, the Stochastic Variability-Simplification Loss encourages lower train and test NCC mismatch in intermediate layers.
— Nearest Class-Center Simplification through Intermediate Layers
(2201.08924 - Ben-Shaul et al., 2022) in Section 4.2 (Decreasing NCC Mismatch using Stochastic Variability-Simplification Loss)