GhostConv: Efficient Convolution Module
- GhostConv is a convolution module that decomposes standard operations into a lightweight intrinsic stage and an efficient ghost feature generation stage.
- It significantly reduces parameters and FLOPs by combining 1x1 convolutions with inexpensive depthwise convolutions.
- GhostConv has been successfully integrated into architectures like GRAN and YOLO11-4K, improving performance in super-resolution and high-resolution object detection.
GhostConv is a computationally efficient convolutional module that decomposes the standard convolutional operation into two stages: extraction of a reduced set of intrinsic feature maps using a lightweight convolution (typically ), followed by generation of the remaining (ghost) feature maps using inexpensive linear transformations such as depthwise convolutions. Originally introduced to address feature redundancy and high computational cost in convolutional neural networks (CNNs), GhostConv underpins modern efficient architectures in tasks such as single-image super-resolution and real-time object detection on high-resolution images (Niu et al., 2023, Hafeez et al., 18 Dec 2025).
1. Mathematical Formulation and Core Principle
Let denote an input feature tensor, with %%%%2%%%% channels and spatial dimensions . The standard convolution with output channels and kernel size is defined as: The parameter count is , and runtime complexity is similarly proportional to .
GhostConv modifies this using an expansion ratio (or in some notations). The process consists of:
- Stage 1: Compute "intrinsic" feature maps using a convolution:
- Stage 2: For each intrinsic map , generate total output maps via linear operators :
is typically the identity; the remaining operators are inexpensive, such as depthwise convolutions with kernel size .
The output is assembled by stacking all ghost and intrinsic maps:
2. Parameter and Computational Efficiency
By replacing the full convolution with a bottlenecked intrinsic convolution plus cheap operators, GhostConv reduces both parameter and compute cost:
- Parameter count:
For typical settings with or $5$, , and , the reduction can be over (Niu et al., 2023).
- FLOPs:
For , , , , the comparison is: - Standard: parameters - GhostConv: parameters ( of standard), with similar FLOPs reduction.
- Empirical results: GRAN achieves a reduction in parameters and reduction in FLOPs compared to RCAN on single-image super-resolution tasks, with negligible quality loss(Niu et al., 2023).
3. Implementation Variants and Instantiations
In GRAN for Super-Resolution
GhostConv is embedded within Ghost Residual Attention Blocks (GRAB). Each GRAB applies:
- GhostConv ()
- ReLU activation
- Second GhostConv
- Channel-and-Spatial Attention Module (CSAM)
- Residual addition
GRABs are grouped into Ghost Residual Groups (GRGs), and the network structure further integrates skip connections at both group and global levels(Niu et al., 2023).
In YOLO11-4K for 4K Object Detection
GhostConv replaces several early convolutions in the backbone. Here:
- The "ghost ratio" is often set to 2 (half channels intrinsic, half ghost).
- Cheap operators are depthwise convolutions.
- The output is constructed as with cropping as necessary to hit the desired number of channels(Hafeez et al., 18 Dec 2025).
Pseudocode Example (as used in YOLO11-4K)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 |
class GhostConv(nn.Module): def __init__(self, in_channels, out_channels, ratio=2, dw_kernel=5): super().__init__() m = math.ceil(out_channels / ratio) self.primary_conv = nn.Conv2d( in_channels, m, kernel_size=1, stride=1, padding=0, bias=False) self.ghost_conv = nn.Conv2d( m, m, kernel_size=dw_kernel, stride=1, padding=dw_kernel//2, groups=m, bias=False) self.out_channels = out_channels def forward(self, x): x_p = self.primary_conv(x) x_g = self.ghost_conv(x_p) ghost_needed = self.out_channels - x_p.shape[1] x_g = x_g[:, :ghost_needed, :, :] out = torch.cat([x_p, x_g], dim=1) return out |
4. Empirical Evaluation and Practical Impact
Both GRAN and YOLO11-4K demonstrate that GhostConv can significantly decrease model size and latency with minimal performance trade-offs.
- In GRAN (super-resolution, Set5, upscaling):
- RCAN: $22.89$M params, $29.96$G FLOPs, PSNR $38.27$, SSIM $0.9614$
- GRAN: $2.24$M params, $4.95$G FLOPs, PSNR $38.16$, SSIM $0.9654$
- In YOLO11-4K (object detection, 4K panoramic images):
- mAP ( IoU): $0.95$ (YOLO11-4K) vs $0.908$ (YOLO11)
- Inference latency: $28.3$ms vs $112.3$ms (a reduction)
- (Niu et al., 2023, Hafeez et al., 18 Dec 2025)
A plausible implication is that GhostConv enables practical deployment of CNN-based vision systems in resource-constrained or real-time environments by reducing computational demands without compromising accuracy.
5. Design Guidelines and Generalization
GhostConv serves as a drop-in replacement for standard convolutional layers, especially in spaces where channel-wise redundancy is present. Design guidelines include:
- Select the ghost ratio or so that or is integer-valued.
- Use convolutions for intrinsic map extraction.
- Employ depthwise convolutions with small kernels (e.g., , ) for ghost feature generation.
- In extremely resource-sensitive contexts, increasing further reduces parameters and latency; both scale as $1/q$.
- GhostConv layers integrate natively with existing attention mechanisms, normalization, and residual connections. (Niu et al., 2023, Hafeez et al., 18 Dec 2025)
6. Applicability, Limitations, and Extensions
GhostConv has been deployed in super-resolution and high-resolution object detection networks, but the principle is broadly applicable to CNN architectures where feature redundancy is suspected. When used appropriately, GhostConv can provide near state-of-the-art accuracy with an order-of-magnitude savings in both parameters and FLOPs. Limitations may arise in contexts where information loss due to reduction in learned filters cannot be compensated by cheap linear operators; model designers should tune and kernel sizes in accordance with dataset complexity (Niu et al., 2023, Hafeez et al., 18 Dec 2025).
References
- GRAN: Ghost Residual Attention Network for Single Image Super Resolution (Niu et al., 2023)
- YOLO11-4K: An Efficient Architecture for Real-Time Small Object Detection in 4K Panoramic Images (Hafeez et al., 18 Dec 2025)