Papers
Topics
Authors
Recent
Search
2000 character limit reached

UltraLBM-UNet-T: Ultra-Compact Segmentation

Updated 1 January 2026
  • UltraLBM-UNet-T is an ultra-compact model leveraging bidirectional Mamba state-space modules and multi-branch feature fusion for efficient skin lesion segmentation.
  • It achieves competitive accuracy with only 0.011M parameters and 0.019 GFLOPs by employing a hybrid knowledge distillation protocol to mimic its teacher model.
  • The model is designed for real-time, point-of-care applications, enabling deployment on resource-constrained mobile devices for robust dermatological analysis.

UltraLBM-UNet-T is an ultra-compact deep learning model designed for skin lesion segmentation in dermatological imaging. It is the distilled student variant of UltraLBM-UNet, characterized by its bidirectional Mamba-based state-space modeling, multi-branch local–global feature perception, and hybrid @@@@1@@@@ protocol. With only 0.011 million parameters and 0.019 GFLOPs, UltraLBM-UNet-T maintains competitive segmentation accuracy compared to both its teacher model and other lightweight baselines, making it well-suited for resource-constrained point-of-care deployments (Fan et al., 25 Dec 2025).

1. Architectural Topology

UltraLBM-UNet-T adopts a six-stage encoder–decoder U-Net backbone with multi-branch local–global modules and bidirectional state-space blocks. The input dimension is 256×256×3. Channel widths at each encoder stage are halved compared to the teacher, yielding [4, 8, 12, 16, 24, 32] output dimensions per stage.

  • Encoder:
    • Stages I–III: Standard Conv+MaxPool blocks.
    • Stages IV–VI: Global-Local Multi-Branch Perception (GLMBP) modules; stages IV and V employ 2×2 max-pool for downsampling, stage VI operates at 1/32 spatial resolution.
  • Decoder: Symmetric to the encoder.
    • Stages I–III: GLMBP modules with bilinear upsampling.
    • Stages IV–V: Conv+bilinear upsampling.
    • Stage VI: 1×1 convolution mapping to a single-channel probability output.
  • Skip Connections: Connect encoder/decoder stages at matching spatial resolutions. Fusion is via XdeciXdeci+kXenciX_{\text{dec}}^i \leftarrow X_{\text{dec}}^i + k \cdot X_{\text{enc}}^i, with kk a learnable scalar.

Parameter count and computational cost are reduced by halving channels compared to UltraLBM-UNet, resulting in approximately one-quarter the resource requirements (0.011M parameters, 0.019 GFLOPs for 256×256 input).

2. Multi-Branch Feature Modules

GLMBP Module

Each GLMBP block receives input XRB×C×H×WX \in \mathbb{R}^{B \times C \times H \times W}, reshapes to B×N×CB \times N \times C (where N=HWN = H \cdot W), and layer-normalizes along the channel axis. Channels are split evenly into four tensors [X1X2X3X4][X_1 \Vert X_2 \Vert X_3 \Vert X_4] with XiRB×N×CX_i \in \mathbb{R}^{B \times N \times C'}, C=C/4C' = C/4.

  • Global branches (X1G,X2GX_1^G, X_2^G): Each uses bidirectional Mamba.
    • Forward: Mi,fwd=M(Xi)M_{i,\text{fwd}} = M(X_i)
    • Backward: Mi,bwd=Flip[M(Flip(Xi))]M_{i,\text{bwd}} = \text{Flip}[M(\text{Flip}(X_i))]
    • Output: XiG=Mi,fwd+Mi,bwd+γXiX_i^G = M_{i,\text{fwd}} + M_{i,\text{bwd}} + \gamma \cdot X_i (with γ\gamma learnable).
  • Local-perception branch (X3LX_3^L): Depthwise-separable conv applied to X3X_3 after reshaping to 2D, followed by residual addition.
  • Identity branch (X4IX_4^I): X4+γX4X_4 + \gamma \cdot X_4.

Final block output is a channel-concatenation: Xout=[X1GX2GX3LX4I]X_{\text{out}} = [X_1^G \Vert X_2^G \Vert X_3^L \Vert X_4^I].

LMBP Module (Stage III)

Identical four-way split and residual design as GLMBP, but both global branches are replaced with additional local DwConv branches (kernel size = 3), omitting Mamba, yielding Xout=[X1LX2LX3LX4I]X_{\text{out}} = [X_1^L \Vert X_2^L \Vert X_3^L \Vert X_4^I].

Bidirectional Mamba State-Space Modeling

Mamba modules implement a linear recurrent state-space layer:

  • Forward recurrence:

xt=Axt1+But,yt=Cxt+Dutx_t = A x_{t-1} + B u_t, \quad y_t = C x_t + D u_t

for input sequence length NN.

  • Backward recurrence by input reversal; shared parameters (A,B,C,D)(A,B,C,D).

3. Knowledge Distillation Protocol

UltraLBM-UNet-T is optimized using a hybrid knowledge distillation framework, transferring representational and predictive knowledge from the UltraLBM-UNet teacher.

  • Teacher: Full UltraLBM-UNet (0.034M params, 0.060 GFLOPs) trained with BCE+Dice loss.
  • Student: UltraLBM-UNet-T (channels halved throughout).

Distillation Losses

Aggregate distillation objective:

Ldistill=λhLhard+λsLDKD+λaLAT+λgLgradL_{\text{distill}} = \lambda_h L_{\text{hard}} + \lambda_s L_{\text{DKD}} + \lambda_a L_{\text{AT}} + \lambda_g L_{\text{grad}}

with all λ\lambda hyperparameters set to 1 (from ablation).

  • Hard-label loss: Lhard=BCE(S^,Y)+Dice(S^,Y)L_{\text{hard}} = \text{BCE}(\hat{S}, Y) + \text{Dice}(\hat{S}, Y)
  • Decoupled Knowledge Distillation (DKD): On logits T(p),S(p)T(p), S(p).
  • Attention Transfer (AT): KL divergence of spatial attention maps (ASA_S, ATA_T) using softmax and temperature τa=1\tau_a=1.
  • Gradient-matching loss: Difference in Sobel edge magnitudes.

Training

AdamW optimizer (learning rate 1e-3, weight decay 1e-2), CosineAnnealingLR (T_max=50, ηmin=1e5\eta_\text{min}=1e-5), batch size 8, 300 epochs, input 256×256, with random flips and rotations. Segmentation evaluated using IoU and DSC, averaged over 3 seeds.

4. Quantitative Performance

Empirical results assess segmentation accuracy and model complexity across three benchmark datasets (ISIC 2017, ISIC 2018, PH²), providing direct comparison to the teacher and several recent lightweight architectures.

Model ISIC 2017 IoU/DSC ISIC 2018 IoU/DSC PH² IoU/DSC Params (M) GFLOPs
MAL-UNet 78.71/88.09 79.42/88.53 83.83/91.20 0.178 0.083
EGE-UNet 78.32/87.84 79.45/88.55 83.36/90.93 0.053 0.072
UltraLight VM-U 77.93/87.59 78.93/88.23 83.31/90.89 0.045 0.069
UltraLBM-UNet-T 78.57/88.00 78.82/88.15 84.92/91.85 0.011 0.019
UltraLBM-UNet 79.82/88.78 79.94/88.85 84.41/91.54 0.034 0.060

UltraLBM-UNet-T retains over 99% of the teacher's inference speed, consumes less than 20 MB·FLOP for 256×256 input, and exhibits a minimal drop in IoU (≈1.25 points) for ISIC 2017/2018. On PH², the student model surpasses the teacher in IoU (84.92 vs 84.41), attributed to improved generalization from distillation. Compared to EGE-UNet and UltraLight VM-U (sub-0.05M models), UltraLBM-UNet-T offers up to 1.6-point IoU gain with 3–5× fewer FLOPs, underscoring suitability for low-latency mobile or handheld POC deployments.

5. Practical Implications and Deployment Suitability

UltraLBM-UNet-T's design yields a model that can execute real-time skin lesion segmentation with minimal computational and memory demands, facilitating point-of-care applications on battery-powered mobile dermatoscopes and handheld diagnostic devices. The extreme compactness (<0.02 GFLOPs, <0.012M params) paired with near-SOTA accuracy is achieved through synergistic integration of bidirectional state-space modeling, local–global fusion, and hybrid distillation—enabling deployment where strict latency and memory budgets are dominant constraints.

A plausible implication is the feasibility of robust lesion analysis outside traditional clinical computing environments, potentially expanding access to dermatological AI services in remote or low-resource settings.

6. Context in Lightweight Medical Segmentation Models

UltraLBM-UNet-T extends the landscape of ultra-lightweight neural architectures for medical image analysis. Its approach aligns with trends favoring efficient parameterization and hybrid knowledge transfer to minimize resource footprints while preserving accuracy. The empirical superiority over previously proposed models such as MAL-UNet, EGE-UNet, and UltraLight VM-U (in both IoU/DSC metrics and FLOP consumption) illustrates effective state-space modeling and distillation as central drivers of the efficiency–accuracy trade-off in domain-specific medical segmentation solutions.

This suggests further advancement in lightweight segmentation will continue to explore combinations of state-space architectures, multi-branch fusion, and refined distillation strategies, especially for edge and point-of-care deployments.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to UltraLBM-UNet-T.