Papers
Topics
Authors
Recent
Search
2000 character limit reached

TopoLoRA-SAM: Topology-Aware Adaptation

Updated 12 January 2026
  • The paper introduces TopoLoRA-SAM, integrating LoRA and a spatial convolutional adapter to achieve high-performance binary segmentation with only ~5% trainable parameters.
  • The architecture leverages a frozen SAM ViT-B encoder, using low-rank updates and a residual spatial module to capture detailed features of thin structures.
  • Topology-aware supervision via the differentiable clDice loss enhances connectivity preservation, crucial for medical and remote sensing segmentation tasks.

TopoLoRA-SAM is a topology-aware, parameter-efficient adaptation framework designed for binary semantic segmentation tasks with an emphasis on thin structures and cross-domain generalization. The method extends foundation segmentation models—specifically the Segment Anything Model (SAM) with a Vision Transformer (ViT) backbone—by incorporating Low-Rank Adaptation (LoRA), a lightweight spatial convolutional adapter, and optional topology-aware supervision via the differentiable clDice loss. TopoLoRA-SAM enables adaptation of a frozen SAM encoder, significantly reducing the number of trainable parameters required compared to conventional fully fine-tuned segmentation models, while providing state-of-the-art performance on thin-structure and noisy-domain datasets (Khazem, 5 Jan 2026).

1. Motivation and Problem Scope

TopoLoRA-SAM addresses the problem of adapting foundation models such as SAM—which offer strong zero-shot generalization—from their original training domains to new, domain-specific binary segmentation tasks involving structures characterized by fine connectivity and sensitivity to topological errors. Thin structures such as retinal vessels or roads are particularly vulnerable, as a single missed pixel can sever crucial topological connections. Standard region-based loss functions (e.g., Binary Cross-Entropy [BCE], Soft Dice) fail to penalize such connectivity disruptions. Additionally, domains such as synthetic aperture radar (SAR) imagery pose further challenges due to noise and data distribution shifts. Conventional full fine-tuning is often computationally expensive and susceptible to catastrophic forgetting; parameter-efficient adaptation is essential to mitigate such drawbacks (Khazem, 5 Jan 2026).

2. Architecture and Adaptation Mechanisms

The TopoLoRA-SAM architecture leverages a frozen SAM ViT-B encoder (12 transformer blocks, 93.7M parameters) and introduces trainable modules for efficient adaptation:

  • Low-Rank Adaptation (LoRA): Each transformer's feed-forward network (FFN) linear layer, i.e., “mlp.lin1” and “mlp.lin2,” is adapted via a low-rank update ΔW=BA\Delta W = BA where ARr×din, BRdout×r, rmin(din,dout)A \in \mathbb{R}^{r \times d_{in}},\ B \in \mathbb{R}^{d_{out} \times r},\ r \ll \min(d_{in}, d_{out}). The rank r=16r = 16 achieves an optimal trade-off between performance and parameter count. LoRA is zero-initialized and scaled by α/r\alpha/r (α=16\alpha=16) for stability, resulting in approximately 2.4M additional trainable parameters. Only LoRA parameters are updated during adaptation; the main SAM weights remain frozen.
  • Spatial Convolutional Adapter: To enhance the encoding of high-resolution, thin-structure detail, a residual depthwise-separable convolution block is integrated into the frozen feature map zR256×H/16×W/16z \in \mathbb{R}^{256 \times H/16 \times W/16}:

z=z+Conv1×1(ReLU(DepthwiseConv3×3(z)))z' = z + \mathrm{Conv}_{1 \times 1}(\mathrm{ReLU}(\mathrm{DepthwiseConv}_{3 \times 3}(z)))

This lightweight module consists of a 3×3 depthwise convolution (256 filters) and a 1×1 pointwise convolution (256→256), amounting to approximately 66K parameters.

  • Mask Decoder: The SAM mask decoder remains fully trainable with ~2.4M parameters.

The combined trainable parameter budget is ~4.9M (5.2% of SAM), contrasting with 100% trainable weights in comparators such as U-Net and DeepLabV3+. All original encoder weights are strictly frozen (Khazem, 5 Jan 2026).

3. Topology-Aware Supervision

TopoLoRA-SAM introduces topology-awareness via the differentiable clDice loss, supplementing conventional region-based losses. The training objective is a weighted sum:

Ltotal=LBCE+LDice+0.5LclDice\mathcal{L}_{\text{total}} = \mathcal{L}_{\text{BCE}} + \mathcal{L}_{\text{Dice}} + 0.5 \cdot \mathcal{L}_{\text{clDice}}

  • BCE Loss: Standard pixel-wise binary cross-entropy.
  • Soft Dice Loss: Overlap-based measure robust to class imbalance.
  • clDice: Measures overlap between the predicted and ground-truth skeletons, thus penalizing connectivity breaks:

clDice(y^,y)=2TprecTsensTprec+Tsens\text{clDice}(\hat{y}, y) = \frac{2 \cdot T_{\text{prec}} \cdot T_{\text{sens}}}{T_{\text{prec}} + T_{\text{sens}}}

where TprecT_{\text{prec}} and TsensT_{\text{sens}} are skeleton-based precision and sensitivity.

This encourages the model to preserve topological features, crucial for domains where structural connectivity is semantically meaningful (Khazem, 5 Jan 2026).

4. Training Protocol and Evaluation Benchmarks

Training and evaluation protocols are standardized for fair comparison:

  • Datasets: Retinal vessel segmentation (DRIVE, STARE, CHASE_DB1), polyp segmentation (Kvasir-SEG), and SAR sea/land segmentation (SL-SSDD).
  • Preprocessing: Images are normalized to ImageNet statistics and resized (retina: 384×384, Kvasir/SL-SSDD: 512×512). Augmentations include flips, scaling, and center cropping with reflection padding; no color jitter is used for SAR.
  • Optimization: AdamW optimizer, learning rate 1×1041 \times 10^{-4} with cosine annealing to 1×1061 \times 10^{-6}, 50 epochs, FP16 mixed-precision. Batch size is harmonized across methods via gradient accumulation.
  • Reproducibility: Results are averaged over 3 random seeds.

The baseline methods include U-Net (ResNet34), DeepLabV3+ (ResNet50), SegFormer (MiT-B0), and Mask2Former (Swin-T), each fully trainable. TopoLoRA-SAM trains only ~5.2% of its parameters (Khazem, 5 Jan 2026).

5. Empirical Results and Comparative Analysis

Parameter Efficiency

Model Total Params (M) Trainable Params (M) % Trainable
U-Net (ResNet34) 24.4 24.4 100%
DeepLabV3+ (ResNet50) 39.8 39.8 100%
SegFormer (MiT-B0) 3.7 3.7 100%
Mask2Former (Swin-T) 47.4 47.4 100%
TopoLoRA-SAM 93.7 4.9 5.2%

Main Dice Scores (mean ± std, 3 seeds)

Dataset U-Net DeepLabV3+ SegFormer Mask2Former TopoLoRA-SAM
DRIVE 0.682±.02 0.699±.01 0.670±.02 0.693±.02 0.701±.018
STARE 0.670±.02 0.685±.02 0.665±.02 0.692±.01 0.684±.019
CHASE_DB1 0.492±.03 0.510±.03 0.520±.02 0.485±.03 0.569±.022
retina-avg 0.615 0.631 0.618 0.623 0.595
Kvasir-SEG 0.882±.01 0.889±.01 0.871±.01 0.893±.01 0.900±.012
SL-SSDD 0.992±.00 0.991±.00 0.988±.00 0.992±.00 0.993±.003
overall avg 0.787 0.798 0.781 0.709 0.735

Topology and Boundary Metrics

On retinal vessel datasets, TopoLoRA-SAM achieves the highest clDice and BFScore. On CHASE_DB1, clDice improves by 6.0 points and BFScore by 8 over Mask2Former.

Ablation and Sensitivity

  • Adapter and clDice: LoRA alone achieves most of the adaptation benefit. Including the adapter improves boundary metrics, and adding clDice further boosts topology, particularly on thin structures.
  • LoRA Rank: Rank r=16r=16 balances parameter count and accuracy.
  • clDice Weight: Setting λcl=0.5\lambda_{cl}=0.5 maximizes trade-off between Dice and topology (clDice).

6. Implementation and Practical Considerations

TopoLoRA-SAM’s practical deployment closely mirrors the frozen SAM inference pipeline, with only a negligible overhead from a single-depth convolution block. The adapter and topology-aware supervision mechanisms notably enhance connectivity preservation in challenging, thin-structure tasks. Code and pretrained weights are publicly available at [https://github.com/salimkhazem/Seglab.git]. The parameter-efficient modules (LoRA, adapter) and topology-sensitive objective (clDice) enable foundation model adaptation for medical and remote sensing applications where connectivity is critical (Khazem, 5 Jan 2026).

7. Significance and Implications

TopoLoRA-SAM demonstrates that the combination of parameter-efficient adaptation and topology-aware supervision can match or surpass fully fine-tuned specialist architectures, especially in domains typified by fine structures and noisy imaging. A plausible implication is that similar strategies may be effective for other high-resolution and cross-domain segmentation scenarios, providing an alternative to full fine-tuning with reduced risk of catastrophic forgetting and resource consumption (Khazem, 5 Jan 2026).

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to TopoLoRA-SAM.