CrackMamba: Topology-Aware Crack Segmentation

Updated 17 February 2026

CrackMamba is a neural architecture that uses selective state-space modeling and topology-aware scanning to segment fine-scale, low-contrast cracks.
It replaces traditional CNNs and Transformers with linear-complexity modules, enabling efficient global context aggregation and reduced computational cost.
Its plug-and-play design with multimodal fusion adapts to diverse benchmarks, offering real-time, edge-deployable performance for structural health monitoring.

CrackMamba refers to a class of Mamba-based neural architectures specifically developed for pixel-level crack segmentation in structural health monitoring of infrastructure such as concrete, asphalt, masonry, and steel. CrackMamba and related networks replace CNNs and Transformers with Selective State-Space Models (SSMs), leveraging linear-complexity global context modeling and topology-aware directional scanning to segment cracks characterized by thin structures, low contrast, and irregular morphology. Several variants have advanced the state of the art in both accuracy and computational efficiency across diverse benchmarks (Chen et al., 2024, He et al., 2024, Liu et al., 3 Mar 2025, Zhu et al., 2024, Liu et al., 30 Jul 2025).

1. Foundations: Mamba State-Space Modeling for Vision

Central to CrackMamba is the Vision Mamba (VMamba) module, which generalizes the S6 state-space model for 2D vision tasks. Instead of local convolutions or quadratic attention, the S6 block applies linear-time recurrences on patch-embedded sequences:

$h_k = A h_{k-1} + B x_k, \qquad y_k = C h_k + D x_k$

where $h_k$ is the hidden state, $x_k$ a patch embedding, and $A, B, C, D$ are learned matrices. The discrete realization scans along multiple directions (e.g., four major axes), enabling each output token to aggregate information from the entire spatial domain with $O(L)$ compute and memory, where $L$ is the number of patches. This is in contrast to self-attention’s $O(L^2)$ and CNNs’ limited receptive field (Chen et al., 2024, He et al., 2024).

The SSM core can be reinterpreted as a form of dynamic convolution or linear attention, establishing a direct connection with established attention blocks and paving the way for hybrid modules that combine gated SSM scanning with domain-specific attention mechanisms (He et al., 2024).

2. Architectural Advancements and Topology-Aware Modules

CrackMamba and its successors have introduced several architectural motifs to adapt the generic Vision Mamba backbone to the fine-scale, topologically complex task of crack segmentation:

Four-directional or snake-like scanning: Instead of scanning features only in standard raster order, modules such as the Structure-Aware Scanning Strategy (SASS) and Efficient Dynamic Guided Scanning Strategy (EDG-SS) reorder patch sequences along four principal directions (parallel, diagonal, and their reverses) or by crack-masked importance, thus enhancing crack continuity modeling (Liu et al., 3 Mar 2025, Liu et al., 30 Jul 2025).
Gated Bottleneck Convolutions (GBC), Multi-kernel Convolutions (LDMK): These blocks reduce parameters by low-rank factorization and add spatial-channel gating that suppresses irrelevant features, enhancing morphology extraction for thin, noisy crack patterns (Liu et al., 3 Mar 2025, Liu et al., 30 Jul 2025).
Pixel Attention Fusion and Dual-domain Fusion: Feature maps from multi-directional scans or different sensing modalities (e.g., RGB, IR, depth) are fused via softmax attention or frequency-domain enhancement to optimally integrate spatial, morphological, and textural information (Zhu et al., 2024, Liu et al., 30 Jul 2025).

A characteristic recipe, as in SCSegamba, consists of patch embedding, several stacked topology-aware Vision Mamba blocks, multi-scale upsampling, and a lightweight segmentation head. Each block alternates between topology-aware scanning and dynamically gated feature mixing, maintaining both global context and local detail (Liu et al., 3 Mar 2025).

3. Plug-and-Play Integration, Complexity, and Theoretical Properties

CrackMamba modules are structurally compatible with U-net, encoder-decoder, and feature-fusion architectures. The modular design allows selective replacement of CNN or attention blocks with SSM-based blocks—enabling parameter and FLOP reductions without sacrificing accuracy.

Parameter and FLOP efficiency: Representative models reduce parameter count by 15–75% and FLOPs by 27–87% compared to Transformer or heavy CNN baselines, while maintaining or improving accuracy (Chen et al., 2024, Liu et al., 3 Mar 2025).
Global receptive field: Analysis and empirical visualizations confirm that four-directional SSM scanning ensures every output pixel is influenced by all spatial locations, unlike local convolutions or shallow ERF expansion in CNNs (He et al., 2024).
Minimal latency for edge deployment: Linear complexity and compact models lead to inference times as low as 16–32 ms per 512×512 frame, suitable for embedded/robotic applications (Liu et al., 3 Mar 2025).

4. Topology-Awareness, Morphology, and Adaptation to Structural Scenarios

Topology-awareness in CrackMamba is achieved through directed scanning strategies and adaptive fusion. Modules are specifically designed to strengthen continuity across crack branches, handle bifurcations, and retain connectivity for long, slender structures under challenging conditions (multi-material, noisy background):

Structure-Aware Scanning: Four snake-like scan routes or crack-sequence reordering ensure that semantic information is propagated efficiently even in highly discontinuous or tortuous cracks (Liu et al., 3 Mar 2025, Liu et al., 30 Jul 2025).
Dynamic fusion mechanisms: Dual-branch and multi-domain adaptivity provide robust feature integration from multimodal sources (RGB, IR, depth, polarization), with frequency-domain enhancement selectively amplifying high-frequency edge content (Liu et al., 30 Jul 2025, Zhu et al., 2024).
Downstream generalization: State-of-the-art results are obtained on standard crack segmentation benchmarks (Crack500, DeepCrack, MC448, CrackSeg9k, SewerCrack), but also on retinal vessel and other topology-rich biomedical images, demonstrating general adaptation to vascular structure segmentation (Chen et al., 2024, Zhu et al., 2024).

5. Empirical Results and Ablation Studies

CrackMamba variants set or approach the benchmark in segmentation quality, efficiency, and generalization:

Model/Variant	Params (M)	F1 / mIoU	FLOPs (G)	Remarks
SCSegamba (Liu et al., 3 Mar 2025)	2.8	0.8390 / 0.8479 (TUT)	18.16	SOTA on multi-scenario cracks
LIDAR (Liu et al., 30 Jul 2025)	5.35	0.8204 / 0.8465 (Depth)	33	SOTA, multimodal (RGB+Depth/IR)
MSCrackMamba (Zhu et al., 2024)	—	mIoU 76.96 (Crack900)	—	IR+RGB, super-res+Mamba fusion
VM-UNet (Chen et al., 2024)	27	mDS 85.7/79.4 (Ozgenel)	16	15–75% fewer params than CNN/ViT
CrackSeU+CrackMamba (He et al., 2024)	1.83	miIoU 76.26 (Steelcrack)	9.97	Plug-and-play Mamba enhancement

Ablations consistently confirm that each component (four-direction scan, dynamic gating, dual-domain fusion, pixel-attention fusion) is necessary for optimal results: removing GBC or SASS results in drops of ∼1.5–2% F1/mIoU, tightening the link between architectural modules and segmentation quality (Liu et al., 3 Mar 2025).

6. Extensions: Multimodal and Super-Resolution Fusion

MSCrackMamba and LIDAR extend the architecture to multimodal crack segmentation. Super-resolving lower-resolution IR or depth channels to match RGB and fusing the modalities at the input or feature level yields improved detection under occlusion, low contrast, or illumination change. Adaptive fusion attention modules and frequency-domain enhancement further boost robustness and accuracy (Zhu et al., 2024, Liu et al., 30 Jul 2025).

7. Practical Implementation and Implications

CrackMamba frameworks achieve real-time inference speeds on mainstream GPUs (32 ms per 512×512 image), with recommended patch sizes (e.g., 8×8), 4–6 stacked blocks for best trade-off between accuracy and efficiency, and composite loss functions (Dice+BCE with tuned weights) to handle crack/non-crack class imbalance. Most implementations are available in standard ML frameworks (PyTorch) and provide configuration for edge deployment (ONNX, TensorRT) (Liu et al., 3 Mar 2025).

A plausible implication is that topology-aware Mamba architectures represent a scalable paradigm for edge-deployable segmentation of filamentary structures—not limited to cracks but extensible to biomedical vessel segmentation, surface defect inspection, and multimodal sensor fusion domains.

References

"Vision Mamba-based autonomous crack segmentation on concrete, asphalt, and masonry surfaces" (Chen et al., 2024)
"Mamba meets crack segmentation" (He et al., 2024)
"SCSegamba: Lightweight Structure-Aware Vision Mamba for Crack Segmentation in Structures" (Liu et al., 3 Mar 2025)
"MSCrackMamba: Leveraging Vision Mamba for Crack Detection in Fused Multispectral Imagery" (Zhu et al., 2024)
"LIDAR: Lightweight Adaptive Cue-Aware Fusion Vision Mamba for Multimodal Segmentation of Structural Cracks" (Liu et al., 30 Jul 2025)