Papers
Topics
Authors
Recent
Search
2000 character limit reached

YOLOv8-seg Model for Instance Segmentation

Updated 15 February 2026
  • YOLOv8-seg is an instance segmentation model that employs a modular, three-stage architecture with a CSP-Darknet backbone, PAN-style neck, and dynamic mask head.
  • It integrates efficient methodologies like attention mechanisms, composite loss functions, and advanced convolutions to balance accuracy, speed, and model size.
  • Its competitive performance in agriculture, transportation, and autonomous navigation is validated by high mAP scores and ultra-fast inference times.

YOLOv8-seg is an anchor-free, one-stage instance segmentation model design from the Ultralytics YOLO family, combining efficient object detection and precise instance-level mask prediction. Its modular, scalable architecture and competitive trade-off between accuracy, speed, and model size have established YOLOv8-seg as a production-ready solution for diverse real-time applications in agriculture, transportation, and autonomous navigation (Sapkota et al., 2024, &&&1&&&, Guo et al., 2024, Yurdakul et al., 7 May 2025).

1. Model Architecture

YOLOv8-seg is structured as a three-stage vision model with distinct backbone, neck, and head components:

  • Backbone:

The backbone is a CSP-Darknet-derived stack featuring a Focus layer, cascaded C2f (Cross Stage Partial with enhanced feature fusion) modules, and a Spatial Pyramid Pooling–Fast (SPPF) block. The Focus stem partitions the 640×640640{\times}640 RGB input into channel-rich low-resolution feature maps. The SPPF block aggregates context across scales, enabling robust spatial encoding (Sapkota et al., 2024, Yurdakul et al., 7 May 2025).

  • Neck:

PAN-style feature pyramid network (FPN) fuses multi-scale information via lateral 1×1 and 3×3 convolutions combined with upsampling and downsampling routines. This produces multi-resolution “P3”, “P4”, and “P5” feature maps, each suitable for detection and segmentation at a specific object scale (Sapkota et al., 2024, Gamani et al., 2024).

  • Head:

YOLOv8-seg employs a decoupled, anchor-free detection head split into separate classification and box regression branches, and attaches a parallel mask segmentation branch to each detection scale (Sapkota et al., 2024). For segmentation, a dynamic mask head processes fused features to output a predicted mask for each instance. In certain variants, a prototype network produces global mask bases combined with dynamically predicted coefficients for per-instance mask synthesis (Gamani et al., 2024).

This design is consistent across all size variants (n, s, m, l, x), differing in depth and width scaling. Table 1, adapted from (Gamani et al., 2024), summarizes typical configuration parameters.

Variant Layers Parameters (M) GFLOPs
YOLOv8n-seg 195 3.26 12.0
YOLOv8s-seg 195 11.78 42.4
YOLOv8m-seg 245 27.22 110.0
YOLOv8l-seg 295 45.91 220.1
YOLOv8x-seg 295 71.72 343.7

2. Loss Functions and Training Objectives

Segmentation training in YOLOv8-seg involves a composite objective L=Lcls+Lbox+LmaskL=L_{\text{cls}}+L_{\text{box}}+L_{\text{mask}}, explicitly combining:

  • Classification loss (LclsL_{\text{cls}}):

Standard binary cross-entropy over C classes:

Lcls=i[tilogpi+(1ti)log(1pi)]L_{\text{cls}} = - \sum_{i} [t_i \log p_i + (1 - t_i) \log(1 - p_i)]

  • Box regression loss (LboxL_{\text{box}}):

By default, YOLOv8-seg uses Complete-IoU (CIoU) loss:

Lbox=1CIoU(B,B^)L_{\text{box}} = 1 - \mathrm{CIoU}(B, \hat{B})

where

CIoU(B,B^)=IoU(B,B^)ρ2(b,b^)c2αv\mathrm{CIoU}(B, \hat{B}) = \mathrm{IoU}(B, \hat{B}) - \frac{\rho^2(b, \hat{b})}{c^2} - \alpha v

with ρ\rho the Euclidean center distance, cc the enclosing box diagonal, vv the aspect ratio consistency, and α\alpha a weighting term (Sapkota et al., 2024, Gamani et al., 2024). Several improved variants substitute WIoU (Guo et al., 2024), where difficult predictions are weighted by a factor ω(IoU)\omega(\mathrm{IoU}), further focusing optimization on hard examples:

LWIoU=ω(IoU)(1IoU)L_{\mathrm{WIoU}} = \omega(\mathrm{IoU}) (1 - \mathrm{IoU})

  • Mask loss (LmaskL_{\text{mask}}):

Per-pixel binary cross-entropy and optionally Dice loss:

Lmask=1Nj=1N[mjlogm^j+(1mj)log(1m^j)]L_{\text{mask}} = -\frac{1}{N} \sum_{j=1}^{N} [m_j \log \hat{m}_j + (1-m_j)\log(1-\hat{m}_j)]

YOLOv8-seg is trained predominantly with SGD or AdamW; early stopping and extensive data augmentation (mosaic, flip, HSV jitter, scale, translation) are standard (Gamani et al., 2024, Sapkota et al., 2024, Yurdakul et al., 7 May 2025).

3. Application Domains and Quantitative Performance

YOLOv8-seg has been benchmarked across diverse detection and segmentation tasks:

  • Agricultural Instance Segmentation:

In green fruit segmentation on immature apples (“All” and occluded/non-occluded classes), YOLOv8l-seg attains box mAP@50 of 0.873 and mask mAP@50 of 0.848 (“All”); mask precision/recall at 0.806/0.798, and inference time (YOLOv8n-seg) as low as 3.3 ms per image (Sapkota et al., 2024). For strawberry maturity stages, YOLOv8n-seg achieves mAP@50 = 0.809, outperforming larger variants in both accuracy and inference speed (24.2 ms/image), demonstrating optimal trade-offs for embedded, real-time agri-robotics (Gamani et al., 2024).

  • Autonomous Driving and Road Defect Detection:

In pothole segmentation, YOLOv8n-seg yields baseline precision 91.9%, recall 85.2%, mAP@50 91.9%; with structural enhancements (DSConv, SimAM, GELU), precision improves to 93.7% and mAP@50 to 93.8% with an inference speed of 110 FPS and parameter count 4.1 M (Yurdakul et al., 7 May 2025). In vehicle and pedestrian segmentation, detection accuracy for “car/person/motorcycle” classes is 94.9%/83.4%/83.2% (YOLOv8n-seg), with improved models surpassing these by 4–6 points depending on class, notably outperforming YOLOv9 on several metrics (Guo et al., 2024).

4. Architectural Enhancements and Research Directions

Numerous modifications enhance YOLOv8-seg’s capacity and efficiency:

  • Backbone Replacement:

Substituting the original CSP-Darknet C2 blocks with FasterNet’s Partial Convolutions reduces computational load and memory by ~24%, simultaneously improving detection accuracy and speed (Guo et al., 2024).

  • Attention Mechanisms:
    • CBAM (Convolutional Block Attention Module) on neck outputs increases recall on small/occluded instances by joint channel/spatial re-weighting (Guo et al., 2024).
    • SimAM, an efficient parameter-free attention, further refines backbone and neck representations for irregular shape delineation, especially effective on non-rigid or edge-rich targets (Yurdakul et al., 7 May 2025).
  • Convolutional Advances:
  • Activation Functions:

GELU activation layers expedite convergence and boost boundary consistency on complex textures, replacing SiLU/Swish routines (Yurdakul et al., 7 May 2025).

Ablation studies confirm each module’s additive effect: DSConv (+0.7 mAP), SimAM (+0.5 mAP), their combination (+1.4 mAP), and all together (+1.9 mAP) versus the YOLOv8n baseline (Yurdakul et al., 7 May 2025).

5. Evaluation Metrics and Inference Trade-offs

YOLOv8-seg employs standard instance segmentation metrics:

IoU=BpredBgtBpredBgt\mathrm{IoU} = \frac{|B_\mathrm{pred} \cap B_\mathrm{gt}|}{|B_\mathrm{pred} \cup B_\mathrm{gt}|}

  • Precision/Recall:

P=TPTP+FP,    R=TPTP+FNP = \frac{\text{TP}}{\text{TP} + \text{FP}}, \;\; R = \frac{\text{TP}}{\text{TP} + \text{FN}}

mAP@50=1Cc=1CAP@IoU=0.5(c)\mathrm{mAP@50} = \frac{1}{C}\sum_{c=1}^{C} \mathrm{AP}@\mathrm{IoU}=0.5(c)

  • Speed/Complexity:

Variants span 3–70M parameters and 10–340 GFLOPs, with YOLOv8n-seg delivering <4<4 ms image inference, and YOLOv8x-seg requiring >20>20 ms (on the green fruit dataset). Larger models offer minor absolute mAP gains at substantial cost in complexity and memory (Sapkota et al., 2024, Gamani et al., 2024).

Key observations:

  • Smaller models (YOLOv8n-seg) often provide the best accuracy-latency trade-off, especially for edge or real-time deployment (Gamani et al., 2024, Sapkota et al., 2024).
  • Marginal segmentation accuracy gains from larger models rarely justify their 2–3x slower inference for embedded tasks.
  • For occluded or low-contrast targets, accuracy drops by 1–3 mAP points compared to fully visible instances across model sizes.

6. Limitations, Failure Modes, and Future Work

YOLOv8-seg, while competitive, exhibits several empirically identified limitations:

  • Failure Modes:
    • False positives on dense or cluttered backgrounds, e.g., mislabeling canopy foliage as fruit (Sapkota et al., 2024).
    • Under-segmentation of heavily occluded instances, resulting in partial masks.
    • Lower recall and segmentation accuracy for visually ambiguous or low-contrast targets (e.g., unripe fruit, indistinct pothole margins) (Gamani et al., 2024, Yurdakul et al., 7 May 2025).
  • Improvements and Research Trends:

Model deployment on embedded systems with further quantization and hardware-specific optimizations remains an open area (Guo et al., 2024). Continual validation under diverse environmental and lighting conditions is necessary to confirm generalization.

7. Summary of Significance

YOLOv8-seg advances the line of efficient, instance-segmentation models by coupling a scalable, compound architecture with leading inference speeds, anchor-free detection, and a fast, effective mask head. Its adaptability—demonstrated in specialized agricultural segmentation (Sapkota et al., 2024, Gamani et al., 2024), road defect detection (Yurdakul et al., 7 May 2025), and autonomous driving (Guo et al., 2024)—originates in its composable design, enabling targeted enhancements via attention, convolutional structure, and loss weighting.

YOLOv8-seg’s strengths include sub-5 ms inference (nano variant), moderate parameter counts (<<5M for n-seg), and competitive segmentation accuracy for real-time systems. With further improvements and domain-specific customization, YOLOv8-seg remains foundational in instance segmentation systems deployed in constrained, latency-critical environments.

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to YOLOv8-seg Model.