Sharpness-Aware Minimization with Z-Score Gradient Filtering for Neural Networks

Published 5 May 2025 in cs.LG, cs.AI, cs.CV, cs.IT, cs.NE, and math.IT | (2505.02369v3)

Abstract: Sharpness-Aware Minimization (SAM) improves neural network generalization by optimizing the worst-case loss within a neighborhood of parameters, yet it perturbs parameters using the entire gradient vector, including components with low statistical significance. We introduce ZSharp, a refined sharpness-aware optimization method that incorporates layer-wise Z-score normalization followed by percentile-based filtering. This process selects only the most statistically significant gradient components-those with large standardized magnitudes-for constructing the perturbation direction. ZSharp retains the standard two-phase SAM structure of ascent and descent while modifying the ascent step to focus on sharper, curvature-relevant directions. We evaluate ZSharp on CIFAR-10, CIFAR-100, and Tiny-ImageNet using a range of models including ResNet, VGG, and Vision Transformers. Across all architectures and datasets, ZSharp consistently achieves higher test accuracy compared to SAM, ASAM, and Friendly-SAM. These results indicate that Z-score-based gradient filtering can enhance the sharpness sensitivity of the update direction, leading to improved generalization in deep neural network training.