SVSL improves NCC mismatch versus vanilla cross-entropy

Show that, with suitable hyperparameters α and γ, optimizing the Stochastic Variability-Simplification Loss—defined as cross-entropy augmented by a penalty over layers j≥γ of the squared distance between each sample’s activation g^{(j)}(x_i) and its batch class-mean—reduces the NCC mismatch at intermediate layers compared to training with vanilla cross-entropy, for both training and test sets during the Terminal Phase of Training.

Background

The Stochastic Variability-Simplification Loss (SVSL) adds to cross-entropy a variance-collapsing term that penalizes the squared distance between intermediate-layer activations and their batch class-means across selected layers. The intent is to encourage class clustering and thereby reduce NCC mismatch throughout the network.

This conjecture claims that, with appropriate α and γ, SVSL yields lower NCC mismatch than vanilla cross-entropy across intermediate layers on both train and test during TPT, thereby promoting more consistent geometric alignment between representations and class centers.

References

Conjecture[SVSL improves NCC mismatch] Using the properly defined hyperparameters α, γ, the Stochastic Variability-Simplification Loss encourages lower train and test NCC mismatch in intermediate layers.

Nearest Class-Center Simplification through Intermediate Layers  (2201.08924 - Ben-Shaul et al., 2022) in Section 4.2 (Decreasing NCC Mismatch using Stochastic Variability-Simplification Loss)