OTOv3: Automatic Architecture-Agnostic Neural Network Training and Compression from Structured Pruning to Erasing Operators

Published 15 Dec 2023 in cs.LG, cs.AI, cs.CL, and cs.CV | (2312.09411v1)

Abstract: Compressing a predefined deep neural network (DNN) into a compact sub-network with competitive performance is crucial in the efficient machine learning realm. This topic spans various techniques, from structured pruning to neural architecture search, encompassing both pruning and erasing operators perspectives. Despite advancements, existing methods suffers from complex, multi-stage processes that demand substantial engineering and domain knowledge, limiting their broader applications. We introduce the third-generation Only-Train-Once (OTOv3), which first automatically trains and compresses a general DNN through pruning and erasing operations, creating a compact and competitive sub-network without the need of fine-tuning. OTOv3 simplifies and automates the training and compression process, minimizes the engineering efforts required from users. It offers key technological advancements: (i) automatic search space construction for general DNNs based on dependency graph analysis; (ii) Dual Half-Space Projected Gradient (DHSPG) and its enhanced version with hierarchical search (H2SPG) to reliably solve (hierarchical) structured sparsity problems and ensure sub-network validity; and (iii) automated sub-network construction using solutions from DHSPG/H2SPG and dependency graphs. Our empirical results demonstrate the efficacy of OTOv3 across various benchmarks in structured pruning and neural architecture search. OTOv3 produces sub-networks that match or exceed the state-of-the-arts. The source code will be available at https://github.com/tianyic/only_train_once.

Abstract PDF HTML Upgrade to Chat

Citations (2)

View on Semantic Scholar

Summary

The paper introduces OTOv3, an architecture-agnostic framework that automates deep neural network training and compression using both structured pruning and operator erasing.
It employs novel dependency graph analyses and innovative sparse optimizers, DHSPG and H2SPG, to generate compact sub-networks efficiently.
Empirical results show that OTOv3 achieves significant reductions in FLOPs and parameters while maintaining competitive accuracy for resource-constrained deployments.

Overview of OTOv3: Automatic Prune and Erase Operators in DNNs

In the rapidly evolving field of deep learning, the scalability of models often clashes with practical deployment in resource-constrained environments. Addressing this challenge, the paper titled "OTOv3: Automatic Architecture-Agnostic Neural Network Training and Compression from Structured Pruning to Erasing Operators" introduces the third-generation Only-Train-Once (OTOv3) framework. OTOv3 automates the training and compression of Deep Neural Networks (DNNs) into compact sub-networks via both pruning and erasing operations, providing a streamlined, architecture-agnostic approach.

Key Contributions

OTOv3 advances the landscape of DNN compression through several pivotal contributions:

Automated Architecture-Agnostic Framework: OTOv3 automatically trains and compresses general DNNs, generating compact sub-networks. It supports two modes: structured pruning, which slims operators while maintaining their presence, and erasing, which removes redundant operators altogether.
Automated Search Space Generation: The framework introduces novel dependency graph analyses to automatically construct search spaces for DNNs. This significantly reduces the engineering efforts typically associated with manual establishment of search spaces required by existing methods.
Innovative Sparse Optimizers:
- Dual Half-Space Projected Gradient (DHSPG): For pruning, OTOv3 employs DHSPG for effective structured sparse optimization, offering reliable sparsity control and enhanced generalization.
- Hierarchical Half-Space Projected Gradient (H2SPG): For erasing, H2SPG is introduced as possibly the first optimizer to address hierarchical structured sparsity problems, ensuring the validness of the resulting sub-network architecture.
Automated Sub-Network Construction: Upon deriving a high-quality solution, OTOv3 automates the construction of compact sub-networks. This obviates the need for further fine-tuning in many cases, particularly when zero-invariant groups (ZIGs) are involved, which ensures outputs remain consistent with the original network's output.

Empirical Results

The paper demonstrates OTOv3’s efficacy across various benchmarks:

In structured pruning, it maintains competitive accuracy while achieving substantial reductions in FLOPs and parameters in networks such as VGG16-BN and ResNet50.
In the erasing mode, OTOv3 effectively identifies redundant structures, achieving notable performance on networks like StackedUnets and DARTS with impressive parameter efficiency and computational cost reduction.

Implications and Future Directions

The implications of OTOv3 are extensive. It addresses a critical need in deploying deep learning models in environments with limited computational resources, such as mobile devices and edge computing. By automating the compression process, OTOv3 democratizes model optimization, making it accessible to broader applications without requiring extensive domain expertise.

Furthermore, OTOv3’s paradigm provides a foundational framework that could be integrated into AutoML systems to streamline model optimization, influencing future directions in automated learning systems and large-scale neural architecture search. Its novel treatment of hierarchical structured sparsity could inspire further algorithmic advancements in network compression.

In conclusion, while OTOv3 may not render the current proliferation of intricately handcrafted model compressions obsolete, it marks a step forward towards automated, general-purpose neural network optimizations suitable for a wide array of applications, promising significant improvements in both research and practical deployment realms.