GhostConv: Efficient Convolution Module

Updated 18 January 2026

GhostConv is a convolution module that decomposes standard operations into a lightweight intrinsic stage and an efficient ghost feature generation stage.
It significantly reduces parameters and FLOPs by combining 1x1 convolutions with inexpensive depthwise convolutions.
GhostConv has been successfully integrated into architectures like GRAN and YOLO11-4K, improving performance in super-resolution and high-resolution object detection.

GhostConv is a computationally efficient convolutional module that decomposes the standard convolutional operation into two stages: extraction of a reduced set of intrinsic feature maps using a lightweight convolution (typically $1 \times 1$ ), followed by generation of the remaining (ghost) feature maps using inexpensive linear transformations such as depthwise convolutions. Originally introduced to address feature redundancy and high computational cost in convolutional neural networks (CNNs), GhostConv underpins modern efficient architectures in tasks such as single-image super-resolution and real-time object detection on high-resolution images (Niu et al., 2023, Hafeez et al., 18 Dec 2025).

1. Mathematical Formulation and Core Principle

Let $X \in \mathbb{R}^{C \times H \times W}$ denote an input feature tensor, with $C$ channels and spatial dimensions $H \times W$ . The standard convolution with $N$ output channels and kernel size $K$ is defined as: $Y = X \otimes f + b, \quad f \in \mathbb{R}^{N \times C \times K \times K}$ The parameter count is $N \cdot C \cdot K^2$ , and runtime complexity is similarly proportional to $N C K^2 H W$ .

GhostConv modifies this using an expansion ratio $q$ (or $X \in \mathbb{R}^{C \times H \times W}$ 0 in some notations). The process consists of:

Stage 1: Compute $X \in \mathbb{R}^{C \times H \times W}$ 1 "intrinsic" feature maps using a $X \in \mathbb{R}^{C \times H \times W}$ 2 convolution:

$X \in \mathbb{R}^{C \times H \times W}$ 3

Stage 2: For each intrinsic map $X \in \mathbb{R}^{C \times H \times W}$ 4, generate $X \in \mathbb{R}^{C \times H \times W}$ 5 total output maps via linear operators $X \in \mathbb{R}^{C \times H \times W}$ 6:

$X \in \mathbb{R}^{C \times H \times W}$ 7

$X \in \mathbb{R}^{C \times H \times W}$ 8 is typically the identity; the remaining operators are inexpensive, such as depthwise convolutions with kernel size $X \in \mathbb{R}^{C \times H \times W}$ 9.

The output $C$ 0 is assembled by stacking all ghost and intrinsic maps: $C$ 1

2. Parameter and Computational Efficiency

By replacing the full convolution with a bottlenecked intrinsic convolution plus cheap operators, GhostConv reduces both parameter and compute cost:

Parameter count:

$C$ 2

For typical settings with $C$ 3 or $C$ 4, $C$ 5, and $C$ 6, the reduction can be over $C$ 7(Niu et al., 2023).

FLOPs:

$C$ 8

For $C$ 9, $H \times W$ 0, $H \times W$ 1, $H \times W$ 2, the comparison is: - Standard: $H \times W$ 3 parameters - GhostConv: $H \times W$ 4 parameters ( $H \times W$ 5 of standard), with similar FLOPs reduction.

Empirical results: GRAN achieves a $H \times W$ 6 reduction in parameters and $H \times W$ 7 reduction in FLOPs compared to RCAN on single-image super-resolution tasks, with negligible quality loss(Niu et al., 2023).

3. Implementation Variants and Instantiations

In GRAN for Super-Resolution

GhostConv is embedded within Ghost Residual Attention Blocks (GRAB). Each GRAB applies:

GhostConv ( $H \times W$ 8)
ReLU activation
Second GhostConv
Channel-and-Spatial Attention Module (CSAM)
Residual addition

GRABs are grouped into Ghost Residual Groups (GRGs), and the network structure further integrates skip connections at both group and global levels(Niu et al., 2023).

In YOLO11-4K for 4K Object Detection

GhostConv replaces several early $H \times W$ 9 convolutions in the backbone. Here:

The "ghost ratio" $N$ 0 is often set to 2 (half channels intrinsic, half ghost).
Cheap operators $N$ 1 are $N$ 2 depthwise convolutions.
The output is constructed as $N$ 3 with cropping as necessary to hit the desired number of channels(Hafeez et al., 18 Dec 2025).

Pseudocode Example (as used in YOLO11-4K)

$Y = X \otimes f + b, \quad f \in \mathbb{R}^{N \times C \times K \times K}$ 9 (Hafeez et al., 18 Dec 2025)

4. Empirical Evaluation and Practical Impact

Both GRAN and YOLO11-4K demonstrate that GhostConv can significantly decrease model size and latency with minimal performance trade-offs.

In GRAN (super-resolution, Set5, $N$ $N$ 4 upscaling):
- RCAN: $N$ 5M params, $N$ 6G FLOPs, PSNR $N$ 7, SSIM $N$ 8
- GRAN: $N$ 9M params, $K$ 0G FLOPs, PSNR $K$ 1, SSIM $K$ 2
In YOLO11-4K (object detection, 4K panoramic images):
- mAP ( $K$ 3 IoU): $K$ 4 (YOLO11-4K) vs $K$ 5 (YOLO11)
- Inference latency: $K$ 6ms vs $K$ 7ms (a $K$ 8 reduction)
- (Niu et al., 2023, Hafeez et al., 18 Dec 2025)

A plausible implication is that GhostConv enables practical deployment of CNN-based vision systems in resource-constrained or real-time environments by reducing computational demands without compromising accuracy.

5. Design Guidelines and Generalization

GhostConv serves as a drop-in replacement for standard convolutional layers, especially in spaces where channel-wise redundancy is present. Design guidelines include:

Select the ghost ratio $K$ 9 or $Y = X \otimes f + b, \quad f \in \mathbb{R}^{N \times C \times K \times K}$ 0 so that $Y = X \otimes f + b, \quad f \in \mathbb{R}^{N \times C \times K \times K}$ 1 or $Y = X \otimes f + b, \quad f \in \mathbb{R}^{N \times C \times K \times K}$ 2 is integer-valued.
Use $Y = X \otimes f + b, \quad f \in \mathbb{R}^{N \times C \times K \times K}$ 3 convolutions for intrinsic map extraction.
Employ depthwise convolutions with small kernels (e.g., $Y = X \otimes f + b, \quad f \in \mathbb{R}^{N \times C \times K \times K}$ 4, $Y = X \otimes f + b, \quad f \in \mathbb{R}^{N \times C \times K \times K}$ 5) for ghost feature generation.
In extremely resource-sensitive contexts, increasing $Y = X \otimes f + b, \quad f \in \mathbb{R}^{N \times C \times K \times K}$ 6 further reduces parameters and latency; both scale as $Y = X \otimes f + b, \quad f \in \mathbb{R}^{N \times C \times K \times K}$ 7.
GhostConv layers integrate natively with existing attention mechanisms, normalization, and residual connections. (Niu et al., 2023, Hafeez et al., 18 Dec 2025)

6. Applicability, Limitations, and Extensions

GhostConv has been deployed in super-resolution and high-resolution object detection networks, but the principle is broadly applicable to CNN architectures where feature redundancy is suspected. When used appropriately, GhostConv can provide near state-of-the-art accuracy with an order-of-magnitude savings in both parameters and FLOPs. Limitations may arise in contexts where information loss due to reduction in learned filters cannot be compensated by cheap linear operators; model designers should tune $Y = X \otimes f + b, \quad f \in \mathbb{R}^{N \times C \times K \times K}$ 8 and kernel sizes in accordance with dataset complexity (Niu et al., 2023, Hafeez et al., 18 Dec 2025).

References

GRAN: Ghost Residual Attention Network for Single Image Super Resolution (Niu et al., 2023)
YOLO11-4K: An Efficient Architecture for Real-Time Small Object Detection in 4K Panoramic Images (Hafeez et al., 18 Dec 2025)

Markdown Report Issue Upgrade to Chat

References (2)

GRAN: Ghost Residual Attention Network for Single Image Super Resolution (2023)

YOLO11-4K: An Efficient Architecture for Real-Time Small Object Detection in 4K Panoramic Images (2025)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to GhostConv.