UltraLBM-UNet-T: Ultra-Compact Segmentation
- UltraLBM-UNet-T is an ultra-compact model leveraging bidirectional Mamba state-space modules and multi-branch feature fusion for efficient skin lesion segmentation.
- It achieves competitive accuracy with only 0.011M parameters and 0.019 GFLOPs by employing a hybrid knowledge distillation protocol to mimic its teacher model.
- The model is designed for real-time, point-of-care applications, enabling deployment on resource-constrained mobile devices for robust dermatological analysis.
UltraLBM-UNet-T is an ultra-compact deep learning model designed for skin lesion segmentation in dermatological imaging. It is the distilled student variant of UltraLBM-UNet, characterized by its bidirectional Mamba-based state-space modeling, multi-branch local–global feature perception, and hybrid @@@@1@@@@ protocol. With only 0.011 million parameters and 0.019 GFLOPs, UltraLBM-UNet-T maintains competitive segmentation accuracy compared to both its teacher model and other lightweight baselines, making it well-suited for resource-constrained point-of-care deployments (Fan et al., 25 Dec 2025).
1. Architectural Topology
UltraLBM-UNet-T adopts a six-stage encoder–decoder U-Net backbone with multi-branch local–global modules and bidirectional state-space blocks. The input dimension is 256×256×3. Channel widths at each encoder stage are halved compared to the teacher, yielding [4, 8, 12, 16, 24, 32] output dimensions per stage.
- Encoder:
- Stages I–III: Standard Conv+MaxPool blocks.
- Stages IV–VI: Global-Local Multi-Branch Perception (GLMBP) modules; stages IV and V employ 2×2 max-pool for downsampling, stage VI operates at 1/32 spatial resolution.
- Decoder: Symmetric to the encoder.
- Stages I–III: GLMBP modules with bilinear upsampling.
- Stages IV–V: Conv+bilinear upsampling.
- Stage VI: 1×1 convolution mapping to a single-channel probability output.
- Skip Connections: Connect encoder/decoder stages at matching spatial resolutions. Fusion is via , with a learnable scalar.
Parameter count and computational cost are reduced by halving channels compared to UltraLBM-UNet, resulting in approximately one-quarter the resource requirements (0.011M parameters, 0.019 GFLOPs for 256×256 input).
2. Multi-Branch Feature Modules
GLMBP Module
Each GLMBP block receives input , reshapes to (where ), and layer-normalizes along the channel axis. Channels are split evenly into four tensors with , .
- Global branches (): Each uses bidirectional Mamba.
- Forward:
- Backward:
- Output: (with learnable).
- Local-perception branch (): Depthwise-separable conv applied to after reshaping to 2D, followed by residual addition.
- Identity branch (): .
Final block output is a channel-concatenation: .
LMBP Module (Stage III)
Identical four-way split and residual design as GLMBP, but both global branches are replaced with additional local DwConv branches (kernel size = 3), omitting Mamba, yielding .
Bidirectional Mamba State-Space Modeling
Mamba modules implement a linear recurrent state-space layer:
- Forward recurrence:
for input sequence length .
- Backward recurrence by input reversal; shared parameters .
3. Knowledge Distillation Protocol
UltraLBM-UNet-T is optimized using a hybrid knowledge distillation framework, transferring representational and predictive knowledge from the UltraLBM-UNet teacher.
- Teacher: Full UltraLBM-UNet (0.034M params, 0.060 GFLOPs) trained with BCE+Dice loss.
- Student: UltraLBM-UNet-T (channels halved throughout).
Distillation Losses
Aggregate distillation objective:
with all hyperparameters set to 1 (from ablation).
- Hard-label loss:
- Decoupled Knowledge Distillation (DKD): On logits .
- Attention Transfer (AT): KL divergence of spatial attention maps (, ) using softmax and temperature .
- Gradient-matching loss: Difference in Sobel edge magnitudes.
Training
AdamW optimizer (learning rate 1e-3, weight decay 1e-2), CosineAnnealingLR (T_max=50, ), batch size 8, 300 epochs, input 256×256, with random flips and rotations. Segmentation evaluated using IoU and DSC, averaged over 3 seeds.
4. Quantitative Performance
Empirical results assess segmentation accuracy and model complexity across three benchmark datasets (ISIC 2017, ISIC 2018, PH²), providing direct comparison to the teacher and several recent lightweight architectures.
| Model | ISIC 2017 IoU/DSC | ISIC 2018 IoU/DSC | PH² IoU/DSC | Params (M) | GFLOPs |
|---|---|---|---|---|---|
| MAL-UNet | 78.71/88.09 | 79.42/88.53 | 83.83/91.20 | 0.178 | 0.083 |
| EGE-UNet | 78.32/87.84 | 79.45/88.55 | 83.36/90.93 | 0.053 | 0.072 |
| UltraLight VM-U | 77.93/87.59 | 78.93/88.23 | 83.31/90.89 | 0.045 | 0.069 |
| UltraLBM-UNet-T | 78.57/88.00 | 78.82/88.15 | 84.92/91.85 | 0.011 | 0.019 |
| UltraLBM-UNet | 79.82/88.78 | 79.94/88.85 | 84.41/91.54 | 0.034 | 0.060 |
UltraLBM-UNet-T retains over 99% of the teacher's inference speed, consumes less than 20 MB·FLOP for 256×256 input, and exhibits a minimal drop in IoU (≈1.25 points) for ISIC 2017/2018. On PH², the student model surpasses the teacher in IoU (84.92 vs 84.41), attributed to improved generalization from distillation. Compared to EGE-UNet and UltraLight VM-U (sub-0.05M models), UltraLBM-UNet-T offers up to 1.6-point IoU gain with 3–5× fewer FLOPs, underscoring suitability for low-latency mobile or handheld POC deployments.
5. Practical Implications and Deployment Suitability
UltraLBM-UNet-T's design yields a model that can execute real-time skin lesion segmentation with minimal computational and memory demands, facilitating point-of-care applications on battery-powered mobile dermatoscopes and handheld diagnostic devices. The extreme compactness (<0.02 GFLOPs, <0.012M params) paired with near-SOTA accuracy is achieved through synergistic integration of bidirectional state-space modeling, local–global fusion, and hybrid distillation—enabling deployment where strict latency and memory budgets are dominant constraints.
A plausible implication is the feasibility of robust lesion analysis outside traditional clinical computing environments, potentially expanding access to dermatological AI services in remote or low-resource settings.
6. Context in Lightweight Medical Segmentation Models
UltraLBM-UNet-T extends the landscape of ultra-lightweight neural architectures for medical image analysis. Its approach aligns with trends favoring efficient parameterization and hybrid knowledge transfer to minimize resource footprints while preserving accuracy. The empirical superiority over previously proposed models such as MAL-UNet, EGE-UNet, and UltraLight VM-U (in both IoU/DSC metrics and FLOP consumption) illustrates effective state-space modeling and distillation as central drivers of the efficiency–accuracy trade-off in domain-specific medical segmentation solutions.
This suggests further advancement in lightweight segmentation will continue to explore combinations of state-space architectures, multi-branch fusion, and refined distillation strategies, especially for edge and point-of-care deployments.