Papers
Topics
Authors
Recent
Search
2000 character limit reached

Differentiable Sparsity via $D$-Gating: Simple and Versatile Structured Penalization

Published 28 Sep 2025 in cs.LG and stat.ML | (2509.23898v3)

Abstract: Structured sparsity regularization offers a principled way to compact neural networks, but its non-differentiability breaks compatibility with conventional stochastic gradient descent and requires either specialized optimizers or additional post-hoc pruning without formal guarantees. In this work, we propose $D$-Gating, a fully differentiable structured overparameterization that splits each group of weights into a primary weight vector and multiple scalar gating factors. We prove that any local minimum under $D$-Gating is also a local minimum using non-smooth structured $L_{2,2/D}$ penalization, and further show that the $D$-Gating objective converges at least exponentially fast to the $L_{2,2/D}$-regularized loss in the gradient flow limit. Together, our results show that $D$-Gating is theoretically equivalent to solving the original group sparsity problem, yet induces distinct learning dynamics that evolve from a non-sparse regime into sparse optimization. We validate our theory across vision, language, and tabular tasks, where $D$-Gating consistently delivers strong performance-sparsity tradeoffs and outperforms both direct optimization of structured penalties and conventional pruning baselines.

Summary

  • The paper introduces D-Gating, a method that achieves structured sparsity via differentiable overparameterization, eliminating the need for specialized optimization routines.
  • It details how decomposing group weights into primary vectors and scalar gating factors enables smooth L2 regularization, ensuring convergence to non-smooth sparsity penalties.
  • Experimental results demonstrate that D-Gating outperforms traditional pruning techniques by offering superior performance-sparsity tradeoffs in linear regression and neural network tasks.

Differentiable Sparsity via DD-Gating: Simple and Versatile Structured Penalization

Introduction

The paper "Differentiable Sparsity via DD-Gating: Simple and Versatile Structured Penalization" addresses a significant challenge in structured sparsity regularization for neural networks. The non-differentiability of traditional sparsity penalties can inhibit compatibility with conventional optimization methods like SGD, necessitating specialized solutions. This paper proposes a novel method named DD-Gating, which introduces a differentiable structured overparameterization strategy. This approach aims to effectively achieve structured sparsity in neural networks using standard SGD without additional complexity.

Methodology

DD-Gating Mechanism: The core of this research is the DD-Gating method, which differentiates itself by decomposing each group weight into a primary weight vector and multiple scalar gating factors. This decomposition facilitates the application of smooth L2L_2 regularization on these components, thus inducing the desired non-differentiable sparsity penalty. The differentiability of this approach is critical as it allows DD-Gating to be seamlessly integrated into existing architectures and training regimes. Figure 1

Figure 1: Parameter trajectories showing the failure of direct GD and convergence using DD-Gating.

Theoretical Equivalence and Optimization Dynamics: The authors establish the theoretical equivalence between the local minima of the DD-Gating objective and the original non-smooth sparsity regularization. They prove that any local minimum with DD-Gating corresponds to a local minimum of non-smooth structured L2,2/DL_{2,2/D} penalization. Additionally, the paper demonstrates that the loss associated with the DD-Gating objective converges exponentially to the L2,2/DL_{2,2/D}-regularized loss in the gradient flow limit.

Numerical Experiments

The experiments validate DD-Gating across various domains, consistently showing superior performance-sparsity tradeoffs compared to traditional methods. Notably, DD-Gating outperforms baselines like direct optimization of structured penalties and conventional pruning techniques. This performance is highlighted in various applications including linear regression tasks, where DD-Gating achieves better regularization paths and improved generalization. Figure 2

Figure 2: Evolution of imbalance during SGD training for varying DD-Gating depths.

Figure 3

Figure 3: Comparison of regularization paths in sparse linear regression using DD-Gating.

Implications and Future Work

The implications of this work are twofold. Practically, DD-Gating reduces the complexity associated with achieving structured sparsity in large neural networks, potentially leading to more efficient models with reduced computational overhead and better interpretability. Theoretically, it opens new pathways in the study of differentiable optimization techniques applicable to non-smooth problems, potentially influencing future neural network design and training methodologies.

Future work could explore the integration of DD-Gating into various architectures, including more complex networks like Transformer models, as well as its application in real-time and embedded systems where efficiency is paramount. An interesting direction would be examining the interplay between DD-Gating and other optimization techniques to improve learning dynamics further.

Conclusion

Differentiable sparsity via DD-Gating offers a promising and practical approach to inducing structured sparsity in deep learning models. By leveraging differentiable overparameterization, it eschews the need for specialized optimization routines and additional pruning steps, simplifying implementation while maintaining robust theoretical underpinnings. The experiments underscore its versatility and effectiveness across diverse tasks, marking it as a valuable tool in the arsenal of neural network optimization techniques.

Paper to Video (Beta)

Whiteboard

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Collections

Sign up for free to add this paper to one or more collections.