Papers
Topics
Authors
Recent
Search
2000 character limit reached

UltraLBM-UNet: Ultralight Lesion Segmentation

Updated 1 January 2026
  • The paper introduces UltraLBM-UNet, an ultralight U-Net architecture that integrates bidirectional Mamba-based global context and multi-branch local feature perception to achieve high segmentation fidelity.
  • Its six-stage encoder–decoder design, featuring variable kernel sizes and shared Mamba weights, delivers robust performance with over 79% IoU and 88% DSC on benchmarks like ISIC17 and PH².
  • The architecture supports real-time deployment in constrained environments with sub-0.06 GFLOPs and a memory footprint under 0.14 MB, validated through extensive ablation studies and hybrid knowledge distillation.

UltraLBM-UNet is an ultralight variant of the U-Net architecture designed for high-performance and resource-efficient skin lesion segmentation. It incorporates a bidirectional Mamba-based global modeling mechanism with multi-branch local feature perception, resulting in a model that delivers robust segmentation accuracy with extremely low computational complexity. The architecture supports deployment in point-of-care scenarios, where memory and inference latency are critical constraints, without sacrificing segmentation fidelity (Fan et al., 25 Dec 2025).

1. Architectural Principles and Configuration

UltraLBM-UNet utilizes a six-stage encoder–decoder configuration. The encoder channels are [8, 16, 24, 32, 48, 64], mirrored in the decoder, enabling hierarchical feature extraction. Early encoder stages (I–III) employ conventional Conv–ReLU–Conv–ReLU blocks with max-pooling for local representation learning. Shallow encoder stage III integrates a Local Multi-Branch Perception (LMBP) module comprising three DwConv branches (kernel sizes: 3, 5, 7 dependent on depth) and an identity branch, emphasizing local feature detail.

Deep encoder stages (IV–VI) and the first three decoder stages feature Global–Local Multi-Branch Perception (GLMBP) modules. These modules merge bidirectional Mamba-based global context with depthwise-separable convolution branches, balancing spatial reach and edge preservation. Skip-connections are implemented via element-wise addition with a learnable scalar scale factor kk, expressed as Xi=X^i+ktiX_i = \hat{X}_i + k t_i. Bilinear interpolation is used for upsampling in the decoder.

The distilled model, UltraLBM-UNet-T, retains an identical topology but halves all channel widths ([4, 8, 12, 16, 24, 32]), lowering both parameter count and FLOPs for further resource efficiency.

2. Bidirectional Mamba-Based Global Modeling

UltraLBM-UNet exploits the Mamba state-space model (SSM) for linear-time, long-range dependency modeling within feature maps. Given a feature map X∈RB×C×H×WX \in \mathbb{R}^{B \times C \times H \times W}, the tensor is flattened to a sequence of length N=H⋅WN = H \cdot W. LayerNorm is applied, and channels are split into four parts, {X1,X2,X3,X4}∈RB×N×(C/4)\{X_1, X_2, X_3, X_4\} \in \mathbb{R}^{B \times N \times (C/4)}.

Branches 1 and 2 serve global context extraction via Mamba modules in bidirectional configuration: Mi,forward=M(Xi),Mi,backward=Flip(M(Flip(Xi)))M_{i,\text{forward}} = M(X_i), \quad M_{i,\text{backward}} = \text{Flip}(M(\text{Flip}(X_i))) Feature fusion is performed by Xig=Mi,forward+Mi,backward+γXiX_i^{g} = M_{i,\text{forward}} + M_{i,\text{backward}} + \gamma X_i, with γ\gamma a learnable scalar. Shared Mamba weights ensure parameter efficiency while doubling contextual information. This strategy enables non-causal global modeling over the spatial domain of the image.

3. Multi-Branch Local Feature Perception and Multi-Receptive-Field Design

Branch 3 processes local features via DwConv, reshaping X3X_3 to 2D, then computing X3l=DwConv2d(reshape(X3))+γX3X_3^l = \text{DwConv}_\text{2d}(\text{reshape}(X_3)) + \gamma X_3. Branch 4 employs an identity shortcut: X4i=X4+γX4X_4^i = X_4 + \gamma X_4. Final multi-branch fusion concatenates outputs along the channel dimension: Xfuse=[X1g∥X2g∥X3l∥X4i]X_{\text{fuse}} = [X_1^g \| X_2^g \| X_3^l \| X_4^i].

Multi-receptive-field design is enforced via variable kernel sizes in GLMBP modules:

  • Encoder stages IV, V, VI: DwConv kernels {3,5,7}\{3, 5, 7\}
  • Decoder stages I, II, III: kernels {7,5,3}\{7, 5, 3\} This composition maximizes localization detail while maintaining parameter compactness.

4. Hybrid Knowledge Distillation to UltraLBM-UNet-T

UltraLBM-UNet-T is trained using a hybrid knowledge distillation regime from the full model. The total loss is: Ldistill=λhLhard+λsLDKD+λaLAT+λgLgradL_{\text{distill}} = \lambda_h L_{\text{hard}} + \lambda_s L_{\text{DKD}} + \lambda_a L_{\text{AT}} + \lambda_g L_{\text{grad}} Where:

  • LhardL_{\text{hard}} is the standard BCE + Dice loss on segmentation outputs,
  • LDKDL_{\text{DKD}} (Decoupled Knowledge Distillation) aligns pixel probability distributions via KL divergence,
  • LATL_{\text{AT}} (Attention Transfer) aligns spatial attention maps,
  • LgradL_{\text{grad}} matches gradient boundaries via Sobel edge operator between student and teacher predictions.

Ablations on the ISIC2017 dataset demonstrate incremental improvements: base student (w/o distillation) achieves 77.30% IoU / 87.20% DSC, DKD only increases to 78.19%/87.76%, and full loss yields 78.57% IoU / 88.00% DSC.

5. Computational Complexity and Resource Profiles

Model complexity is strictly constrained:

  • UltraLBM-UNet: 0.034M parameters (≈0.14 MB), 0.060 GFLOPs for 256×256256 \times 256 inputs.
  • UltraLBM-UNet-T: 0.011M parameters (≈0.044 MB), 0.019 GFLOPs.

Module breakdown: | Module | Params (M) | FLOPs (G) | |------------------------------------|:----------:|:---------:| | Stem & shallow encoder (I–III) | 0.012 | 0.020 | | Encoder/decoder GLMBP (6 total) | 0.016 | 0.030 | | Skip-scale, upsampling, LayerNorm | 0.006 | 0.010 |

Editor's term: "Resource profiles" — UltraLBM-UNet and its student variant require exceedingly small memory and compute, making them suitable for real-time operation on embedded GPUs (e.g., NVIDIA Jetson, mobile NPUs) with sub-10 ms inference and <1 MB footprint.

6. Segmentation Performance Benchmarks

Extensive evaluation was conducted on ISIC2017, ISIC2018, and PH² datasets. The following table presents segmentation accuracy and resource usage for existing lightweight models and UltraLBM-UNet variants:

Model ISIC17 IoU/DSC ISIC18 IoU/DSC PH² IoU/DSC Params (M) FLOPs (G)
MAL-UNet 78.71/88.09 79.42/88.53 83.83/91.20 0.178 0.083
EGE-UNet 78.32/87.84 79.45/88.55 83.36/90.93 0.053 0.072
UltraLight VM-UNet 77.93/87.59 78.93/88.23 83.31/90.89 0.045 0.069
UltraLBM-UNet-T 78.57/88.00 78.82/88.15 84.92/91.85 0.011 0.019
UltraLBM-UNet 79.82/88.78 79.94/88.85 84.41/91.54 0.034 0.060

On ISIC2017, UltraLBM-UNet exhibits the highest IoU (79.82%) and DSC (88.78%) among all sub-1M parameter models. On PH², the distilled student model achieves the top average IoU/DSC (84.92%/91.85%) (Fan et al., 25 Dec 2025). This suggests that ultra-compact architectures can match or surpass larger models in segmentation fidelity when equipped with efficient context modeling and knowledge transfer strategies.

7. Deployment Advantage and Design Rationale

UltraLBM-UNet is explicitly designed for point-of-care scenarios requiring high throughput, low latency, and minimal compute resources. Model weights of <0.14 MB and GFLOPs of ≤0.06 enable stable operation on low-power devices, including embedded platforms and mobile processors. Absence of matrix-heavy self-attention ensures deterministic and predictable resource usage. Multi-branch identity shortcuts stabilize gradients and preserve detail, while learnable skip-scales and fixed branching favour fixed-point inferencing (e.g., for INT8 quantization).

Balanced fusion between bidirectional Mamba-based global modeling and multi-kernel depthwise convolution supports robust edge detection and contextual reasoning. Shared Mamba weights maintain model simplicity despite bidirectional traversal. Hybrid knowledge distillation further compresses the student model without runtime trade-off, transferring structural, spatial, and boundary cues from the teacher.

UltraLBM-UNet thus establishes a new Pareto frontier for model accuracy versus resource cost in skin lesion segmentation, specifically tailored for real-time clinical and embedded deployment contexts (Fan et al., 25 Dec 2025).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to UltraLBM-UNet.