Grouped Dilated Depthwise Conv (GDBlock)

Updated 3 February 2026

Grouped Dilated Depthwise Convolution Block (GDBlock) is a neural network primitive that decouples receptive field expansion from channel mixing using grouped and dilated convolutions.
It employs independent dilated depthwise convolutions per channel group to capture local, mid-range, and long-range contexts, enhancing multi-scale feature extraction.
The design integrates an aggregator module with grouped point-wise convolutions to efficiently fuse multi-scale features while reducing parameters and FLOPs.

Grouped Dilated Depthwise Convolutional Block (GDBlock) is a neural network primitive designed to decouple receptive field expansion from feature recombination, maximizing multi-scale context extraction while minimizing computation and parameter overhead. Originating in the context of real-time, resource-constrained vision systems, particularly UAV-based emergency monitoring, GDBlock integrates channel grouping, dilated depthwise convolution, and efficient aggregation to achieve state-of-the-art accuracy-efficiency tradeoffs without introducing global attention or self-attention modules (Nedeljković, 8 Dec 2025).

1. Architectural Definition and Mathematical Formulation

GDBlock operates on an input tensor $X \in \mathbb{R}^{h \times w \times C}$ using a defined cardinality $m$ , partitioning the $C$ channels into $m$ non-overlapping groups of size $c = C/m$ . Each group $g$ is assigned a distinct dilation rate $d_g$ . A typical dilation schedule used in GlimmerNet is $\{d_1, d_2, d_3, d_4\} = \{1, 2, 2, 3\}$ for $m = 4$ .

For each group $g$ , the operation is: $Y^g = \mathrm{ReLU}_6 \left( \mathrm{BN} \left( \mathrm{DWConv}( X^g; k=3, \mathrm{dilation}=d_g ) \right) \right), \quad g = 1, \ldots, m$ where $\mathrm{DWConv}$ denotes channel-wise (depthwise) convolution with a $3 \times 3$ kernel and dilation $d_g$ , followed by batch normalization and ReLU6 activation.

Outputs $Y^g$ are concatenated along the channel dimension:

$Y = \mathrm{Concat} ( Y^1, Y^2, \ldots, Y^m ) \in \mathbb{R}^{h \times w \times C}$

A residual connection yields the final output: $X + Y$ .

2. Channel Partitioning, Dilation, and Multi-scale Feature Extraction

Channel partitioning in GDBlock enables independent processing of feature groups at different receptive field scales in a single convolutional block. Dilated depthwise convolution within each group allows for unique spatial spans without parameter increase. For example, group assignments of $d=1$ process local context, $d=2$ capture mid-range structure, and $d=3$ address long-range dependencies, simultaneously and disjointly within $X$ .

This design avoids the need for stacking multiple convolutional blocks for multi-scale context or for global attention modules with high computational cost. Each block delivers multi-scale features at the same overall FLOPs and parameter budget as a single standard depthwise convolution (Nedeljković, 8 Dec 2025).

3. Aggregator Module: Cross-group Fusion

To recombine and align multi-dilated features, the Aggregator module is introduced. It operates as follows:

Cross-dilation regrouping: Channels from all groups with the same within-group index are transposed to align features extracted from the same spatial context but different dilation scales, forming $c$ groups of $m$ channels.
Mixed concatenation: This regrouped tensor $R$ is interleaved with the original block input $X_0$ to yield $M \in \mathbb{R}^{h \times w \times 2C}$ .
Grouped point-wise convolution: To mix features efficiently, a grouped $1 \times 1$ convolution is applied with group size $2m$, equivalent to $c = C/m$ groups, each mapping $2m$ input channels to $m$ output channels. This yields the final cross-group fused output.

The grouped structure leads to a parameter count of $2mC$ instead of $C^2$ for dense $1 \times 1$ convolutions. For $m=4$ and $C=80$ , the Aggregator uses $640$ weights versus $6400$, a $1/m$ reduction (Nedeljković, 8 Dec 2025).

4. Computational Complexity and Efficiency

GDBlock’s computational characteristics arise from the use of depthwise convolutions and grouped aggregation:

Per block: Parameters and FLOPs for GDBlock are unchanged from baseline depthwise ( $Ck^2$ and $h w C k^2$ , with $k=3$ ).
Aggregator: Contributes $2mC$ parameters and $h w 2mC$ FLOPs per application.
Comparison: For $C=80, m=4$ , standard $1 \times 1$ convolution incurs $6400$ parameters; grouped Aggregator requires $640$. For GlimmerNet on AIDERv2, the total parameter count is $31,$204 with $22.26$M FLOPs, showing a $16.6\%$ reduction in parameters and a $29\%$ FLOPs reduction relative to the TakuNet baseline, while achieving a weighted F1-score of $0.966$ ( $+0.008$ over baseline) (Nedeljković, 8 Dec 2025).

Layer Type	Parameters (for C=80, m=4)	FLOPs per $h \times w$
Standard 1×1	$6400$	$6400$
Grouped PWConv	$640$	$640$

This efficiency enables real-time onboard deployment for edge and UAV use cases, with ablation experiments confirming strong performance gains at fixed cost.

5. Hardware Considerations and Accelerator Support

Efficient implementation of grouped dilated depthwise convolution is advantageous for dedicated hardware accelerators. A unified dataflow supports both regular and DDC layers at full processing element (PE) utilization (Chen et al., 2021):

Key hardware features: SRAM banks for feature maps and weights, address generators per layer, CLPU with PEs and parallel MAC lanes, APLPU for activation/quantization.
Dual-mode operation: Regular convolution mode for dense $1 \times 1$ ; DDC mode for grouped/dilated/depthwise convolutions. Large kernel and dilation incur no extra PE overhead due to parallel multi-offset MAC lanes.
Throughput: A 512-MAC array at $400$ MHz delivers a measured $180$ FPS (VGA input, RetinaFace with DDC).

This shows that GDBlock layering can be integrated directly into high-throughput embedded systems for vision inference (Chen et al., 2021).

6. Empirical Results and Application Impact

When deployed in GlimmerNet for emergency response UAV imagery, the GDBlock framework enables high accuracy with minimal computational and memory footprint (Nedeljković, 8 Dec 2025):

AIDERv2 dataset: State-of-the-art weighted F1-score of $0.966$ with $31K$ parameters and $29\%$ fewer FLOPs than prior best models.
Ablation: Adding grouped dilated depthwise convolution to a baseline DWConv backbone increased F1-score from $0.928$ to $0.933$; further application of the Aggregator raised it to $0.936$ (at $4.7$M FLOPs).

In related edge vision domains, DDC replacement of regular convolutions (face detection, image classification) yields $20$– $30\%$ model size and computation savings with negligible accuracy loss or a $1\%$ improvement for larger receptive fields (Chen et al., 2021).

7. Theoretical and Practical Significance

Grouped dilated depthwise convolution provides a principled decoupling of receptive field expansion and channel mixing. Partitioning channels into groups separately leverages short, medium, and long-range context, while the Aggregator ensures global cross-scale awareness with parameter efficiency. Visualizations confirm that each group specializes in distinct context scales, and the Aggregator aligns them into globally coherent, spatially-aware activations, even on sparse emergency cues (e.g., separated fire/smoke) (Nedeljković, 8 Dec 2025).

A plausible implication is the broad applicability of GDBlock for real-time, embedded, or resource-constrained settings, especially where global context is critical but full global attention is computationally prohibitive. The generic design is amenable to quantization and adapts to dedicated hardware with full utilization, supporting kernels up to $7 \times 7$ and dilation $d \leq 3$ , sustaining $>200$ GOPS (Chen et al., 2021).

Markdown Report Issue Upgrade to Chat

References (2)

GlimmerNet: A Lightweight Grouped Dilated Depthwise Convolutions for UAV-Based Emergency Monitoring (2025)

Hardware Architecture of Embedded Inference Accelerator and Analysis of Algorithms for Depthwise and Large-Kernel Convolutions (2021)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Grouped Dilated Depthwise Convolution (GDBlock).