TopoLoRA-SAM: Topology-Aware Adaptation
- The paper introduces TopoLoRA-SAM, integrating LoRA and a spatial convolutional adapter to achieve high-performance binary segmentation with only ~5% trainable parameters.
- The architecture leverages a frozen SAM ViT-B encoder, using low-rank updates and a residual spatial module to capture detailed features of thin structures.
- Topology-aware supervision via the differentiable clDice loss enhances connectivity preservation, crucial for medical and remote sensing segmentation tasks.
TopoLoRA-SAM is a topology-aware, parameter-efficient adaptation framework designed for binary semantic segmentation tasks with an emphasis on thin structures and cross-domain generalization. The method extends foundation segmentation models—specifically the Segment Anything Model (SAM) with a Vision Transformer (ViT) backbone—by incorporating Low-Rank Adaptation (LoRA), a lightweight spatial convolutional adapter, and optional topology-aware supervision via the differentiable clDice loss. TopoLoRA-SAM enables adaptation of a frozen SAM encoder, significantly reducing the number of trainable parameters required compared to conventional fully fine-tuned segmentation models, while providing state-of-the-art performance on thin-structure and noisy-domain datasets (Khazem, 5 Jan 2026).
1. Motivation and Problem Scope
TopoLoRA-SAM addresses the problem of adapting foundation models such as SAM—which offer strong zero-shot generalization—from their original training domains to new, domain-specific binary segmentation tasks involving structures characterized by fine connectivity and sensitivity to topological errors. Thin structures such as retinal vessels or roads are particularly vulnerable, as a single missed pixel can sever crucial topological connections. Standard region-based loss functions (e.g., Binary Cross-Entropy [BCE], Soft Dice) fail to penalize such connectivity disruptions. Additionally, domains such as synthetic aperture radar (SAR) imagery pose further challenges due to noise and data distribution shifts. Conventional full fine-tuning is often computationally expensive and susceptible to catastrophic forgetting; parameter-efficient adaptation is essential to mitigate such drawbacks (Khazem, 5 Jan 2026).
2. Architecture and Adaptation Mechanisms
The TopoLoRA-SAM architecture leverages a frozen SAM ViT-B encoder (12 transformer blocks, 93.7M parameters) and introduces trainable modules for efficient adaptation:
- Low-Rank Adaptation (LoRA): Each transformer's feed-forward network (FFN) linear layer, i.e., “mlp.lin1” and “mlp.lin2,” is adapted via a low-rank update where . The rank achieves an optimal trade-off between performance and parameter count. LoRA is zero-initialized and scaled by () for stability, resulting in approximately 2.4M additional trainable parameters. Only LoRA parameters are updated during adaptation; the main SAM weights remain frozen.
- Spatial Convolutional Adapter: To enhance the encoding of high-resolution, thin-structure detail, a residual depthwise-separable convolution block is integrated into the frozen feature map :
This lightweight module consists of a 3×3 depthwise convolution (256 filters) and a 1×1 pointwise convolution (256→256), amounting to approximately 66K parameters.
- Mask Decoder: The SAM mask decoder remains fully trainable with ~2.4M parameters.
The combined trainable parameter budget is ~4.9M (5.2% of SAM), contrasting with 100% trainable weights in comparators such as U-Net and DeepLabV3+. All original encoder weights are strictly frozen (Khazem, 5 Jan 2026).
3. Topology-Aware Supervision
TopoLoRA-SAM introduces topology-awareness via the differentiable clDice loss, supplementing conventional region-based losses. The training objective is a weighted sum:
- BCE Loss: Standard pixel-wise binary cross-entropy.
- Soft Dice Loss: Overlap-based measure robust to class imbalance.
- clDice: Measures overlap between the predicted and ground-truth skeletons, thus penalizing connectivity breaks:
where and are skeleton-based precision and sensitivity.
This encourages the model to preserve topological features, crucial for domains where structural connectivity is semantically meaningful (Khazem, 5 Jan 2026).
4. Training Protocol and Evaluation Benchmarks
Training and evaluation protocols are standardized for fair comparison:
- Datasets: Retinal vessel segmentation (DRIVE, STARE, CHASE_DB1), polyp segmentation (Kvasir-SEG), and SAR sea/land segmentation (SL-SSDD).
- Preprocessing: Images are normalized to ImageNet statistics and resized (retina: 384×384, Kvasir/SL-SSDD: 512×512). Augmentations include flips, scaling, and center cropping with reflection padding; no color jitter is used for SAR.
- Optimization: AdamW optimizer, learning rate with cosine annealing to , 50 epochs, FP16 mixed-precision. Batch size is harmonized across methods via gradient accumulation.
- Reproducibility: Results are averaged over 3 random seeds.
The baseline methods include U-Net (ResNet34), DeepLabV3+ (ResNet50), SegFormer (MiT-B0), and Mask2Former (Swin-T), each fully trainable. TopoLoRA-SAM trains only ~5.2% of its parameters (Khazem, 5 Jan 2026).
5. Empirical Results and Comparative Analysis
Parameter Efficiency
| Model | Total Params (M) | Trainable Params (M) | % Trainable |
|---|---|---|---|
| U-Net (ResNet34) | 24.4 | 24.4 | 100% |
| DeepLabV3+ (ResNet50) | 39.8 | 39.8 | 100% |
| SegFormer (MiT-B0) | 3.7 | 3.7 | 100% |
| Mask2Former (Swin-T) | 47.4 | 47.4 | 100% |
| TopoLoRA-SAM | 93.7 | 4.9 | 5.2% |
Main Dice Scores (mean ± std, 3 seeds)
| Dataset | U-Net | DeepLabV3+ | SegFormer | Mask2Former | TopoLoRA-SAM |
|---|---|---|---|---|---|
| DRIVE | 0.682±.02 | 0.699±.01 | 0.670±.02 | 0.693±.02 | 0.701±.018 |
| STARE | 0.670±.02 | 0.685±.02 | 0.665±.02 | 0.692±.01 | 0.684±.019 |
| CHASE_DB1 | 0.492±.03 | 0.510±.03 | 0.520±.02 | 0.485±.03 | 0.569±.022 |
| retina-avg | 0.615 | 0.631 | 0.618 | 0.623 | 0.595 |
| Kvasir-SEG | 0.882±.01 | 0.889±.01 | 0.871±.01 | 0.893±.01 | 0.900±.012 |
| SL-SSDD | 0.992±.00 | 0.991±.00 | 0.988±.00 | 0.992±.00 | 0.993±.003 |
| overall avg | 0.787 | 0.798 | 0.781 | 0.709 | 0.735 |
Topology and Boundary Metrics
On retinal vessel datasets, TopoLoRA-SAM achieves the highest clDice and BFScore. On CHASE_DB1, clDice improves by 6.0 points and BFScore by 8 over Mask2Former.
Ablation and Sensitivity
- Adapter and clDice: LoRA alone achieves most of the adaptation benefit. Including the adapter improves boundary metrics, and adding clDice further boosts topology, particularly on thin structures.
- LoRA Rank: Rank balances parameter count and accuracy.
- clDice Weight: Setting maximizes trade-off between Dice and topology (clDice).
6. Implementation and Practical Considerations
TopoLoRA-SAM’s practical deployment closely mirrors the frozen SAM inference pipeline, with only a negligible overhead from a single-depth convolution block. The adapter and topology-aware supervision mechanisms notably enhance connectivity preservation in challenging, thin-structure tasks. Code and pretrained weights are publicly available at [https://github.com/salimkhazem/Seglab.git]. The parameter-efficient modules (LoRA, adapter) and topology-sensitive objective (clDice) enable foundation model adaptation for medical and remote sensing applications where connectivity is critical (Khazem, 5 Jan 2026).
7. Significance and Implications
TopoLoRA-SAM demonstrates that the combination of parameter-efficient adaptation and topology-aware supervision can match or surpass fully fine-tuned specialist architectures, especially in domains typified by fine structures and noisy imaging. A plausible implication is that similar strategies may be effective for other high-resolution and cross-domain segmentation scenarios, providing an alternative to full fine-tuning with reduced risk of catastrophic forgetting and resource consumption (Khazem, 5 Jan 2026).