UltraLBM-UNet: Ultralight Lesion Segmentation

Updated 1 January 2026

The paper introduces UltraLBM-UNet, an ultralight U-Net architecture that integrates bidirectional Mamba-based global context and multi-branch local feature perception to achieve high segmentation fidelity.
Its six-stage encoder–decoder design, featuring variable kernel sizes and shared Mamba weights, delivers robust performance with over 79% IoU and 88% DSC on benchmarks like ISIC17 and PH².
The architecture supports real-time deployment in constrained environments with sub-0.06 GFLOPs and a memory footprint under 0.14 MB, validated through extensive ablation studies and hybrid knowledge distillation.

UltraLBM-UNet is an ultralight variant of the U-Net architecture designed for high-performance and resource-efficient skin lesion segmentation. It incorporates a bidirectional Mamba-based global modeling mechanism with multi-branch local feature perception, resulting in a model that delivers robust segmentation accuracy with extremely low computational complexity. The architecture supports deployment in point-of-care scenarios, where memory and inference latency are critical constraints, without sacrificing segmentation fidelity (Fan et al., 25 Dec 2025).

1. Architectural Principles and Configuration

UltraLBM-UNet utilizes a six-stage encoder–decoder configuration. The encoder channels are [8, 16, 24, 32, 48, 64], mirrored in the decoder, enabling hierarchical feature extraction. Early encoder stages (I–III) employ conventional Conv–ReLU–Conv–ReLU blocks with max-pooling for local representation learning. Shallow encoder stage III integrates a Local Multi-Branch Perception (LMBP) module comprising three DwConv branches (kernel sizes: 3, 5, 7 dependent on depth) and an identity branch, emphasizing local feature detail.

Deep encoder stages (IV–VI) and the first three decoder stages feature Global–Local Multi-Branch Perception (GLMBP) modules. These modules merge bidirectional Mamba-based global context with depthwise-separable convolution branches, balancing spatial reach and edge preservation. Skip-connections are implemented via element-wise addition with a learnable scalar scale factor $k$ , expressed as $X_i = \hat{X}_i + k t_i$ . Bilinear interpolation is used for upsampling in the decoder.

The distilled model, UltraLBM-UNet-T, retains an identical topology but halves all channel widths ([4, 8, 12, 16, 24, 32]), lowering both parameter count and FLOPs for further resource efficiency.

2. Bidirectional Mamba-Based Global Modeling

UltraLBM-UNet exploits the Mamba state-space model (SSM) for linear-time, long-range dependency modeling within feature maps. Given a feature map $X \in \mathbb{R}^{B \times C \times H \times W}$ , the tensor is flattened to a sequence of length $N = H \cdot W$ . LayerNorm is applied, and channels are split into four parts, $\{X_1, X_2, X_3, X_4\} \in \mathbb{R}^{B \times N \times (C/4)}$ .

Branches 1 and 2 serve global context extraction via Mamba modules in bidirectional configuration: $M_{i,\text{forward}} = M(X_i), \quad M_{i,\text{backward}} = \text{Flip}(M(\text{Flip}(X_i)))$ Feature fusion is performed by $X_i^{g} = M_{i,\text{forward}} + M_{i,\text{backward}} + \gamma X_i$ , with $\gamma$ a learnable scalar. Shared Mamba weights ensure parameter efficiency while doubling contextual information. This strategy enables non-causal global modeling over the spatial domain of the image.

3. Multi-Branch Local Feature Perception and Multi-Receptive-Field Design

Branch 3 processes local features via DwConv, reshaping $X_3$ to 2D, then computing $X_3^l = \text{DwConv}_\text{2d}(\text{reshape}(X_3)) + \gamma X_3$ . Branch 4 employs an identity shortcut: $X_4^i = X_4 + \gamma X_4$ . Final multi-branch fusion concatenates outputs along the channel dimension: $X_{\text{fuse}} = [X_1^g \| X_2^g \| X_3^l \| X_4^i]$ .

Multi-receptive-field design is enforced via variable kernel sizes in GLMBP modules:

Encoder stages IV, V, VI: DwConv kernels $\{3, 5, 7\}$
Decoder stages I, II, III: kernels $\{7, 5, 3\}$ This composition maximizes localization detail while maintaining parameter compactness.

4. Hybrid Knowledge Distillation to UltraLBM-UNet-T

UltraLBM-UNet-T is trained using a hybrid knowledge distillation regime from the full model. The total loss is: $L_{\text{distill}} = \lambda_h L_{\text{hard}} + \lambda_s L_{\text{DKD}} + \lambda_a L_{\text{AT}} + \lambda_g L_{\text{grad}}$ Where:

$L_{\text{hard}}$ is the standard BCE + Dice loss on segmentation outputs,
$L_{\text{DKD}}$ (Decoupled Knowledge Distillation) aligns pixel probability distributions via KL divergence,
$L_{\text{AT}}$ (Attention Transfer) aligns spatial attention maps,
$L_{\text{grad}}$ matches gradient boundaries via Sobel edge operator between student and teacher predictions.

Ablations on the ISIC2017 dataset demonstrate incremental improvements: base student (w/o distillation) achieves 77.30% IoU / 87.20% DSC, DKD only increases to 78.19%/87.76%, and full loss yields 78.57% IoU / 88.00% DSC.

5. Computational Complexity and Resource Profiles

Model complexity is strictly constrained:

UltraLBM-UNet: 0.034M parameters (≈0.14 MB), 0.060 GFLOPs for $256 \times 256$ inputs.
UltraLBM-UNet-T: 0.011M parameters (≈0.044 MB), 0.019 GFLOPs.

Module breakdown: | Module | Params (M) | FLOPs (G) | |------------------------------------|:----------:|:---------:| | Stem & shallow encoder (I–III) | 0.012 | 0.020 | | Encoder/decoder GLMBP (6 total) | 0.016 | 0.030 | | Skip-scale, upsampling, LayerNorm | 0.006 | 0.010 |

Editor's term: "Resource profiles" — UltraLBM-UNet and its student variant require exceedingly small memory and compute, making them suitable for real-time operation on embedded GPUs (e.g., NVIDIA Jetson, mobile NPUs) with sub-10 ms inference and <1 MB footprint.

6. Segmentation Performance Benchmarks

Extensive evaluation was conducted on ISIC2017, ISIC2018, and PH² datasets. The following table presents segmentation accuracy and resource usage for existing lightweight models and UltraLBM-UNet variants:

Model	ISIC17 IoU/DSC	ISIC18 IoU/DSC	PH² IoU/DSC	Params (M)	FLOPs (G)
MAL-UNet	78.71/88.09	79.42/88.53	83.83/91.20	0.178	0.083
EGE-UNet	78.32/87.84	79.45/88.55	83.36/90.93	0.053	0.072
UltraLight VM-UNet	77.93/87.59	78.93/88.23	83.31/90.89	0.045	0.069
UltraLBM-UNet-T	78.57/88.00	78.82/88.15	84.92/91.85	0.011	0.019
UltraLBM-UNet	79.82/88.78	79.94/88.85	84.41/91.54	0.034	0.060

On ISIC2017, UltraLBM-UNet exhibits the highest IoU (79.82%) and DSC (88.78%) among all sub-1M parameter models. On PH², the distilled student model achieves the top average IoU/DSC (84.92%/91.85%) (Fan et al., 25 Dec 2025). This suggests that ultra-compact architectures can match or surpass larger models in segmentation fidelity when equipped with efficient context modeling and knowledge transfer strategies.

7. Deployment Advantage and Design Rationale

UltraLBM-UNet is explicitly designed for point-of-care scenarios requiring high throughput, low latency, and minimal compute resources. Model weights of <0.14 MB and GFLOPs of ≤0.06 enable stable operation on low-power devices, including embedded platforms and mobile processors. Absence of matrix-heavy self-attention ensures deterministic and predictable resource usage. Multi-branch identity shortcuts stabilize gradients and preserve detail, while learnable skip-scales and fixed branching favour fixed-point inferencing (e.g., for INT8 quantization).

Balanced fusion between bidirectional Mamba-based global modeling and multi-kernel depthwise convolution supports robust edge detection and contextual reasoning. Shared Mamba weights maintain model simplicity despite bidirectional traversal. Hybrid knowledge distillation further compresses the student model without runtime trade-off, transferring structural, spatial, and boundary cues from the teacher.

UltraLBM-UNet thus establishes a new Pareto frontier for model accuracy versus resource cost in skin lesion segmentation, specifically tailored for real-time clinical and embedded deployment contexts (Fan et al., 25 Dec 2025).

Markdown Report Issue Upgrade to Chat

References (1)

UltraLBM-UNet: Ultralight Bidirectional Mamba-based Model for Skin Lesion Segmentation (2025)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to UltraLBM-UNet.

UltraLBM-UNet: Ultralight Lesion Segmentation

1. Architectural Principles and Configuration

2. Bidirectional Mamba-Based Global Modeling

3. Multi-Branch Local Feature Perception and Multi-Receptive-Field Design

4. Hybrid Knowledge Distillation to UltraLBM-UNet-T

5. Computational Complexity and Resource Profiles

6. Segmentation Performance Benchmarks

7. Deployment Advantage and Design Rationale

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Don't miss out on important new AI/ML research

UltraLBM-UNet: Ultralight Lesion Segmentation

1. Architectural Principles and Configuration

2. Bidirectional Mamba-Based Global Modeling

3. Multi-Branch Local Feature Perception and Multi-Receptive-Field Design

4. Hybrid Knowledge Distillation to UltraLBM-UNet-T

5. Computational Complexity and Resource Profiles

6. Segmentation Performance Benchmarks

7. Deployment Advantage and Design Rationale

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Related Topics

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research