Papers
Topics
Authors
Recent
Search
2000 character limit reached

GhostConv: Efficient Convolution Module

Updated 18 January 2026
  • GhostConv is a convolution module that decomposes standard operations into a lightweight intrinsic stage and an efficient ghost feature generation stage.
  • It significantly reduces parameters and FLOPs by combining 1x1 convolutions with inexpensive depthwise convolutions.
  • GhostConv has been successfully integrated into architectures like GRAN and YOLO11-4K, improving performance in super-resolution and high-resolution object detection.

GhostConv is a computationally efficient convolutional module that decomposes the standard convolutional operation into two stages: extraction of a reduced set of intrinsic feature maps using a lightweight convolution (typically 1×11 \times 1), followed by generation of the remaining (ghost) feature maps using inexpensive linear transformations such as depthwise convolutions. Originally introduced to address feature redundancy and high computational cost in convolutional neural networks (CNNs), GhostConv underpins modern efficient architectures in tasks such as single-image super-resolution and real-time object detection on high-resolution images (Niu et al., 2023, Hafeez et al., 18 Dec 2025).

1. Mathematical Formulation and Core Principle

Let X∈RC×H×WX \in \mathbb{R}^{C \times H \times W} denote an input feature tensor, with %%%%2%%%% channels and spatial dimensions H×WH \times W. The standard convolution with NN output channels and kernel size KK is defined as: Y=X⊗f+b,f∈RN×C×K×KY = X \otimes f + b, \quad f \in \mathbb{R}^{N \times C \times K \times K} The parameter count is N⋅C⋅K2N \cdot C \cdot K^2, and runtime complexity is similarly proportional to NCK2HWN C K^2 H W.

GhostConv modifies this using an expansion ratio qq (or ss in some notations). The process consists of:

  • Stage 1: Compute m=N/qm = N / q "intrinsic" feature maps using a 1×11 \times 1 convolution:

Y′=X⊗f′,f′∈Rm×C×1×1Y' = X \otimes f', \quad f' \in \mathbb{R}^{m \times C \times 1 \times 1}

  • Stage 2: For each intrinsic map yi′y'_{i}, generate qq total output maps via linear operators {Ψi,j}\{\Psi_{i, j}\}:

{yi,j}={Ψi,j(yi′)},j=1,…,q\{y_{i, j}\} = \{\Psi_{i, j}(y'_i)\}, \quad j = 1, \ldots, q

Ψi,1\Psi_{i, 1} is typically the identity; the remaining operators are inexpensive, such as depthwise convolutions with kernel size d≪Kd \ll K.

The output Y∈RN×H′×W′Y \in \mathbb{R}^{N \times H' \times W'} is assembled by stacking all ghost and intrinsic maps: Y=[Ψ1(Y′),Ψ2(Y′),...,Ψq(Y′)]Y = [\Psi_1(Y'), \Psi_2(Y'), ..., \Psi_q(Y')]

2. Parameter and Computational Efficiency

By replacing the full convolution with a bottlenecked intrinsic convolution plus cheap operators, GhostConv reduces both parameter and compute cost:

  • Parameter count:

ParamsGhostConv=(N/q)⋅C+(q−1)(N/q)d2\text{Params}_\mathrm{GhostConv} = (N/q) \cdot C + (q-1)(N/q) d^2

For typical settings with d≈3d \approx 3 or $5$, q=2..4q=2..4, and C≫d2C \gg d^2, the reduction can be over 10×10 \times(Niu et al., 2023).

  • FLOPs:

FLOPsGhostConv=H′W′[(N/q)C+(q−1)(N/q)d2]\text{FLOPs}_\mathrm{GhostConv} = H' W' \left[ (N/q) C + (q-1)(N/q) d^2 \right]

For q=3q=3, C=N=128C=N=128, K=3K=3, d=3d=3, the comparison is: - Standard: 128⋅128⋅3⋅3=147,456128 \cdot 128 \cdot 3 \cdot 3 = 147,456 parameters - GhostConv: ≈6,229\approx 6,229 parameters (≈4.2%\approx 4.2\% of standard), with similar FLOPs reduction.

  • Empirical results: GRAN achieves a 10.2×10.2 \times reduction in parameters and 6×6 \times reduction in FLOPs compared to RCAN on single-image super-resolution tasks, with negligible quality loss(Niu et al., 2023).

3. Implementation Variants and Instantiations

In GRAN for Super-Resolution

GhostConv is embedded within Ghost Residual Attention Blocks (GRAB). Each GRAB applies:

  1. GhostConv (C→CC \to C)
  2. ReLU activation
  3. Second GhostConv
  4. Channel-and-Spatial Attention Module (CSAM)
  5. Residual addition

GRABs are grouped into Ghost Residual Groups (GRGs), and the network structure further integrates skip connections at both group and global levels(Niu et al., 2023).

In YOLO11-4K for 4K Object Detection

GhostConv replaces several early 3×33 \times 3 convolutions in the backbone. Here:

  • The "ghost ratio" ss is often set to 2 (half channels intrinsic, half ghost).
  • Cheap operators φ\varphi are 5×55 \times 5 depthwise convolutions.
  • The output is constructed as [Conv1×1(X),DWConv5×5(Conv1×1(X))][\mathrm{Conv}_{1\times1}(X), \mathrm{DWConv}_{5\times5}(\mathrm{Conv}_{1\times1}(X))] with cropping as necessary to hit the desired number of channels(Hafeez et al., 18 Dec 2025).

Pseudocode Example (as used in YOLO11-4K)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
class GhostConv(nn.Module):
    def __init__(self, in_channels, out_channels, ratio=2, dw_kernel=5):
        super().__init__()
        m = math.ceil(out_channels / ratio)
        self.primary_conv = nn.Conv2d(
            in_channels, m, kernel_size=1, stride=1, padding=0, bias=False)
        self.ghost_conv = nn.Conv2d(
            m, m, kernel_size=dw_kernel, stride=1, padding=dw_kernel//2, groups=m, bias=False)
        self.out_channels = out_channels

    def forward(self, x):
        x_p = self.primary_conv(x)
        x_g = self.ghost_conv(x_p)
        ghost_needed = self.out_channels - x_p.shape[1]
        x_g = x_g[:, :ghost_needed, :, :]
        out = torch.cat([x_p, x_g], dim=1)
        return out
(Hafeez et al., 18 Dec 2025)

4. Empirical Evaluation and Practical Impact

Both GRAN and YOLO11-4K demonstrate that GhostConv can significantly decrease model size and latency with minimal performance trade-offs.

  • In GRAN (super-resolution, Set5, 2×2\times upscaling):
    • RCAN: $22.89$M params, $29.96$G FLOPs, PSNR $38.27$, SSIM $0.9614$
    • GRAN: $2.24$M params, $4.95$G FLOPs, PSNR $38.16$, SSIM $0.9654$
  • In YOLO11-4K (object detection, 4K panoramic images):

A plausible implication is that GhostConv enables practical deployment of CNN-based vision systems in resource-constrained or real-time environments by reducing computational demands without compromising accuracy.

5. Design Guidelines and Generalization

GhostConv serves as a drop-in replacement for standard convolutional layers, especially in spaces where channel-wise redundancy is present. Design guidelines include:

  • Select the ghost ratio qq or ss so that m=N/qm = N/q or n/sn/s is integer-valued.
  • Use 1×11 \times 1 convolutions for intrinsic map extraction.
  • Employ depthwise convolutions with small kernels (e.g., 3×33 \times 3, 5×55 \times 5) for ghost feature generation.
  • In extremely resource-sensitive contexts, increasing qq further reduces parameters and latency; both scale as $1/q$.
  • GhostConv layers integrate natively with existing attention mechanisms, normalization, and residual connections. (Niu et al., 2023, Hafeez et al., 18 Dec 2025)

6. Applicability, Limitations, and Extensions

GhostConv has been deployed in super-resolution and high-resolution object detection networks, but the principle is broadly applicable to CNN architectures where feature redundancy is suspected. When used appropriately, GhostConv can provide near state-of-the-art accuracy with an order-of-magnitude savings in both parameters and FLOPs. Limitations may arise in contexts where information loss due to reduction in learned filters cannot be compensated by cheap linear operators; model designers should tune qq and kernel sizes in accordance with dataset complexity (Niu et al., 2023, Hafeez et al., 18 Dec 2025).


References

  • GRAN: Ghost Residual Attention Network for Single Image Super Resolution (Niu et al., 2023)
  • YOLO11-4K: An Efficient Architecture for Real-Time Small Object Detection in 4K Panoramic Images (Hafeez et al., 18 Dec 2025)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to GhostConv.