Global Channel Gating (GCG)
- Global Channel Gating (GCG) is a neural network pruning framework that uses per-channel hard gating and hypergraph dependency modeling for hardware-agnostic compression.
- It employs an adaptive auxiliary loss and iterative pruning with fine-tuning to maintain accuracy while reducing FLOPs and latency, especially in complex architectures like ResNet.
- Empirical evaluations on ImageNet demonstrate that GCG achieves significant computational and latency reductions with minimal drops in top-1 accuracy.
Global Channel Gating (GCG) is a structured neural network pruning framework introduced to address the need for hardware-agnostic network compression with state-of-the-art accuracy-resource trade-offs. GCG leverages per-channel hard gating, hypergraph-based dependency modeling, and an adaptive auxiliary loss targeting computational, memory, or empirical latency cost, yielding fine control over pruning behavior even in architectures with complex connectivity patterns such as ResNets. Notably, it can remove entire non-sequential blocks and consistently enforces identical pruning across skip-linked layers. On standard benchmarks such as ResNet-50/ImageNet, GCG demonstrates substantial FLOPs and latency reductions while preserving accuracy competitively to or exceeding prior methods (Passov et al., 2022).
1. Channel-wise Hard Gating Mechanism
GCG introduces a learnable gating system parameterized per channel. Let denote the activation (input or output) to a convolution. For each channel , a learnable scalar is associated, forming the basis for a Bernoulli hard gate . Gate sampling employs a binary Gumbel-logistic distribution:
With in practice, training uses a straight-through estimator: the forward pass applies hard thresholding, while gradients are backpropagated through the smooth sigmoid. The gated activation is . To prune output channels, is post-convolution; for input channels, it is pre-convolution.
2. Auxiliary Cost and Loss Formulation
The pruning process is regularized with an auxiliary cost function reflecting resource consumption, combined with the original training loss (e.g., cross-entropy) as
where manages the trade-off between accuracy and pruning aggressiveness. Edge-wise channel dependencies are grouped as , with per-edge gate vectors . The number of active channels on edge at iteration is . Each edge is assigned a per-channel cost , parameterizable for memory, theoretical FLOPs, or empirical latency. For instance, edge-wise FLOPs are:
where encodes down-sampling and is kernel size. The normalized cost per edge is:
To ensure balanced pruning across varied edges, the learning rate for the gating parameters is rescaled:
mitigating premature pruning of shallower layers with large contribution to loss.
3. Hypergraph-based Modeling of Channel Dependencies
GCG's pruning must respect architectural dependencies, especially in non-sequential designs such as ResNets, in which a channel may simultaneously propagate through the principal path and skip connections. For this, an undirected hypergraph encodes channel dependencies:
- : union of all conv input/output channel indices.
- Each hyperedge : set of vertices representing a "dependency group", i.e., input/output channels across convolutions sharing the same tensor dimension.
For ResNet-50, this results in edges: $32$ sequential (within bottlenecks) and $5$ large skip-path edges spanning multiple layers. Pruning is enacted by setting gates for all channels in a dependency group to zero, ensuring consistent structural reduction across all affected pathways.
4. Iterative Pruning and Fine-Tuning Algorithm
GCG applies pruning in staged iterations, each comprising gating, hard-channel removal, and fine-tuning. The process is described as follows:
- Initialization: Learning rates reset; gating -values retained from previous iteration.
- Gating Phase ( epochs): For each batch, sample Gumbel-logistic noise for every gate, compute as above, apply gating in forward pass, evaluate , and backpropagate using per-edge learning rates.
- Prune: After the gating phase, for each gate , compute its soft activation probability: . If , set (channel permanently pruned).
- Fine-Tuning Phase ( epochs): Fix pruned channels, train only remaining weights under .
This is repeated for several iterations, gradually increasing to drive increased sparsity while maintaining accuracy through intermediate compensation.
5. Empirical Evaluation on ResNet-50 / ImageNet
GCG achieves notable compression and speedup on standard benchmarks. Table 1 summarizes main empirical results as reported for ResNet-50 on ImageNet, with two tuning objectives: FLOPs reduction and latency speedup.
| Configuration | FLOPs Reduction | Top-1 Accuracy (%) | Top-5 Accuracy (%) | Relative Speedup (×) |
|---|---|---|---|---|
| Baseline (ResNet-50) | – | 76.15 | 92.87 | 1.00 |
| Gator flops 0.5 | 49.9% | 75.19 (–0.96) | 92.61 (–0.26) | 1.44 |
| Gator latency 0.5 | 48.3% | 75.28 (–0.87) | 92.50 (–0.37) | 1.65 |
| Gator flops 1.0 | 62.6% | 74.14 (–2.01) | 91.99 (–0.88) | 1.61 |
| Gator latency 1.0 | 61.2% | 74.24 (–1.91) | 91.95 (–0.92) | 1.86 |
| Gator flops 2.0 | 76.6% | 72.36 (–3.79) | 90.97 (–1.90) | 2.11 |
In the regime of approximately 50% FLOPs reduction, GCG incurs only a 1% absolute drop in top-1 accuracy, with a real-world GPU speedup of 1.4×, surpassing prior structured-pruning approaches both in computational and empirical metrics.
6. Context and Significance within Neural Network Compression
GCG generalizes channel pruning by introducing flexible, differentiable hard gating, composable with architectures manifesting complex connectivity. Its auxiliary cost can be tuned explicitly for mission-specific resource constraints, whether FLOPs, memory footprint, or device-specific latency, an advance over prior pruning schemes tied rigidly to FLOPs or layerwise strategies. The hypergraph dependency formalism ensures pruning is feasible for architectures with multi-path connections, notably without the need to disentangle skip-linked tensors manually. A plausible implication is the broadening of structured pruning's applicability to emerging model classes with intricate topologies.
7. Availability and Extensions
The implementation codebase for GCG, designated as "Gator," is publicly available at https://github.com/EliPassov/gator. This facilitates reproducibility and adaptation to alternative architectures and deployment restrictions (Passov et al., 2022). While the core mechanism targets convolutional networks, the underlying dependency modeling is not tied to a specific layer type, potentially enabling adaptation to other structured tensor factorizations or reparameterizations. Further empirical studies may clarify efficacy across non-vision modalities and in conjunction with quantization and knowledge distillation.
For detailed algorithms, model dependencies, and supplementary results, refer to "Gator: Customizable Channel Pruning of Neural Networks with Gating" (Passov et al., 2022).