TopoLoRA-SAM: Topology-Aware Adaptation

Updated 12 January 2026

The paper introduces TopoLoRA-SAM, integrating LoRA and a spatial convolutional adapter to achieve high-performance binary segmentation with only ~5% trainable parameters.
The architecture leverages a frozen SAM ViT-B encoder, using low-rank updates and a residual spatial module to capture detailed features of thin structures.
Topology-aware supervision via the differentiable clDice loss enhances connectivity preservation, crucial for medical and remote sensing segmentation tasks.

TopoLoRA-SAM is a topology-aware, parameter-efficient adaptation framework designed for binary semantic segmentation tasks with an emphasis on thin structures and cross-domain generalization. The method extends foundation segmentation models—specifically the Segment Anything Model (SAM) with a Vision Transformer (ViT) backbone—by incorporating Low-Rank Adaptation (LoRA), a lightweight spatial convolutional adapter, and optional topology-aware supervision via the differentiable clDice loss. TopoLoRA-SAM enables adaptation of a frozen SAM encoder, significantly reducing the number of trainable parameters required compared to conventional fully fine-tuned segmentation models, while providing state-of-the-art performance on thin-structure and noisy-domain datasets (Khazem, 5 Jan 2026).

1. Motivation and Problem Scope

TopoLoRA-SAM addresses the problem of adapting foundation models such as SAM—which offer strong zero-shot generalization—from their original training domains to new, domain-specific binary segmentation tasks involving structures characterized by fine connectivity and sensitivity to topological errors. Thin structures such as retinal vessels or roads are particularly vulnerable, as a single missed pixel can sever crucial topological connections. Standard region-based loss functions (e.g., Binary Cross-Entropy [BCE], Soft Dice) fail to penalize such connectivity disruptions. Additionally, domains such as synthetic aperture radar (SAR) imagery pose further challenges due to noise and data distribution shifts. Conventional full fine-tuning is often computationally expensive and susceptible to catastrophic forgetting; parameter-efficient adaptation is essential to mitigate such drawbacks (Khazem, 5 Jan 2026).

2. Architecture and Adaptation Mechanisms

The TopoLoRA-SAM architecture leverages a frozen SAM ViT-B encoder (12 transformer blocks, 93.7M parameters) and introduces trainable modules for efficient adaptation:

Low-Rank Adaptation (LoRA): Each transformer's feed-forward network (FFN) linear layer, i.e., “mlp.lin1” and “mlp.lin2,” is adapted via a low-rank update $\Delta W = BA$ where $A \in \mathbb{R}^{r \times d_{in}},\ B \in \mathbb{R}^{d_{out} \times r},\ r \ll \min(d_{in}, d_{out})$ . The rank $r = 16$ achieves an optimal trade-off between performance and parameter count. LoRA is zero-initialized and scaled by $\alpha/r$ ( $\alpha=16$ ) for stability, resulting in approximately 2.4M additional trainable parameters. Only LoRA parameters are updated during adaptation; the main SAM weights remain frozen.
Spatial Convolutional Adapter: To enhance the encoding of high-resolution, thin-structure detail, a residual depthwise-separable convolution block is integrated into the frozen feature map $z \in \mathbb{R}^{256 \times H/16 \times W/16}$ :

$z' = z + \mathrm{Conv}_{1 \times 1}(\mathrm{ReLU}(\mathrm{DepthwiseConv}_{3 \times 3}(z)))$

This lightweight module consists of a 3×3 depthwise convolution (256 filters) and a 1×1 pointwise convolution (256→256), amounting to approximately 66K parameters.

Mask Decoder: The SAM mask decoder remains fully trainable with ~2.4M parameters.

The combined trainable parameter budget is ~4.9M (5.2% of SAM), contrasting with 100% trainable weights in comparators such as U-Net and DeepLabV3+. All original encoder weights are strictly frozen (Khazem, 5 Jan 2026).

3. Topology-Aware Supervision

TopoLoRA-SAM introduces topology-awareness via the differentiable clDice loss, supplementing conventional region-based losses. The training objective is a weighted sum:

$\mathcal{L}_{\text{total}} = \mathcal{L}_{\text{BCE}} + \mathcal{L}_{\text{Dice}} + 0.5 \cdot \mathcal{L}_{\text{clDice}}$

BCE Loss: Standard pixel-wise binary cross-entropy.
Soft Dice Loss: Overlap-based measure robust to class imbalance.
clDice: Measures overlap between the predicted and ground-truth skeletons, thus penalizing connectivity breaks:

$\text{clDice}(\hat{y}, y) = \frac{2 \cdot T_{\text{prec}} \cdot T_{\text{sens}}}{T_{\text{prec}} + T_{\text{sens}}}$

where $T_{\text{prec}}$ and $T_{\text{sens}}$ are skeleton-based precision and sensitivity.

This encourages the model to preserve topological features, crucial for domains where structural connectivity is semantically meaningful (Khazem, 5 Jan 2026).

4. Training Protocol and Evaluation Benchmarks

Training and evaluation protocols are standardized for fair comparison:

Datasets: Retinal vessel segmentation (DRIVE, STARE, CHASE_DB1), polyp segmentation (Kvasir-SEG), and SAR sea/land segmentation (SL-SSDD).
Preprocessing: Images are normalized to ImageNet statistics and resized (retina: 384×384, Kvasir/SL-SSDD: 512×512). Augmentations include flips, scaling, and center cropping with reflection padding; no color jitter is used for SAR.
Optimization: AdamW optimizer, learning rate $1 \times 10^{-4}$ with cosine annealing to $1 \times 10^{-6}$ , 50 epochs, FP16 mixed-precision. Batch size is harmonized across methods via gradient accumulation.
Reproducibility: Results are averaged over 3 random seeds.

The baseline methods include U-Net (ResNet34), DeepLabV3+ (ResNet50), SegFormer (MiT-B0), and Mask2Former (Swin-T), each fully trainable. TopoLoRA-SAM trains only ~5.2% of its parameters (Khazem, 5 Jan 2026).

5. Empirical Results and Comparative Analysis

Parameter Efficiency

Model	Total Params (M)	Trainable Params (M)	% Trainable
U-Net (ResNet34)	24.4	24.4	100%
DeepLabV3+ (ResNet50)	39.8	39.8	100%
SegFormer (MiT-B0)	3.7	3.7	100%
Mask2Former (Swin-T)	47.4	47.4	100%
TopoLoRA-SAM	93.7	4.9	5.2%

Main Dice Scores (mean ± std, 3 seeds)

Dataset	U-Net	DeepLabV3+	SegFormer	Mask2Former	TopoLoRA-SAM
DRIVE	0.682±.02	0.699±.01	0.670±.02	0.693±.02	0.701±.018
STARE	0.670±.02	0.685±.02	0.665±.02	0.692±.01	0.684±.019
CHASE_DB1	0.492±.03	0.510±.03	0.520±.02	0.485±.03	0.569±.022
retina-avg	0.615	0.631	0.618	0.623	0.595
Kvasir-SEG	0.882±.01	0.889±.01	0.871±.01	0.893±.01	0.900±.012
SL-SSDD	0.992±.00	0.991±.00	0.988±.00	0.992±.00	0.993±.003
overall avg	0.787	0.798	0.781	0.709	0.735

Topology and Boundary Metrics

On retinal vessel datasets, TopoLoRA-SAM achieves the highest clDice and BFScore. On CHASE_DB1, clDice improves by 6.0 points and BFScore by 8 over Mask2Former.

Ablation and Sensitivity

Adapter and clDice: LoRA alone achieves most of the adaptation benefit. Including the adapter improves boundary metrics, and adding clDice further boosts topology, particularly on thin structures.
LoRA Rank: Rank $r=16$ balances parameter count and accuracy.
clDice Weight: Setting $\lambda_{cl}=0.5$ maximizes trade-off between Dice and topology (clDice).

6. Implementation and Practical Considerations

TopoLoRA-SAM’s practical deployment closely mirrors the frozen SAM inference pipeline, with only a negligible overhead from a single-depth convolution block. The adapter and topology-aware supervision mechanisms notably enhance connectivity preservation in challenging, thin-structure tasks. Code and pretrained weights are publicly available at [https://github.com/salimkhazem/Seglab.git]. The parameter-efficient modules (LoRA, adapter) and topology-sensitive objective (clDice) enable foundation model adaptation for medical and remote sensing applications where connectivity is critical (Khazem, 5 Jan 2026).

7. Significance and Implications

TopoLoRA-SAM demonstrates that the combination of parameter-efficient adaptation and topology-aware supervision can match or surpass fully fine-tuned specialist architectures, especially in domains typified by fine structures and noisy imaging. A plausible implication is that similar strategies may be effective for other high-resolution and cross-domain segmentation scenarios, providing an alternative to full fine-tuning with reduced risk of catastrophic forgetting and resource consumption (Khazem, 5 Jan 2026).

Markdown Report Issue Upgrade to Chat

References (1)

TopoLoRA-SAM: Topology-Aware Parameter-Efficient Adaptation of Foundation Segmenters for Thin-Structure and Cross-Domain Binary Semantic Segmentation (2026)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to TopoLoRA-SAM.

TopoLoRA-SAM: Topology-Aware Adaptation

1. Motivation and Problem Scope

2. Architecture and Adaptation Mechanisms

3. Topology-Aware Supervision

4. Training Protocol and Evaluation Benchmarks

5. Empirical Results and Comparative Analysis

Parameter Efficiency

Main Dice Scores (mean ± std, 3 seeds)

Topology and Boundary Metrics

Ablation and Sensitivity

6. Implementation and Practical Considerations

7. Significance and Implications

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Don't miss out on important new AI/ML research

TopoLoRA-SAM: Topology-Aware Adaptation

1. Motivation and Problem Scope

2. Architecture and Adaptation Mechanisms

3. Topology-Aware Supervision

4. Training Protocol and Evaluation Benchmarks

5. Empirical Results and Comparative Analysis

Parameter Efficiency

Main Dice Scores (mean ± std, 3 seeds)

Topology and Boundary Metrics

Ablation and Sensitivity

6. Implementation and Practical Considerations

7. Significance and Implications

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Related Topics

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research