NeuroMamba: Multi-Perspective Feature Interaction with Visual Mamba for Neuron Segmentation

Published 22 Jan 2026 in cs.CV | (2601.15929v1)

Abstract: Neuron segmentation is the cornerstone of reconstructing comprehensive neuronal connectomes, which is essential for deciphering the functional organization of the brain. The irregular morphology and densely intertwined structures of neurons make this task particularly challenging. Prevailing CNN-based methods often fail to resolve ambiguous boundaries due to the lack of long-range context, whereas Transformer-based methods suffer from boundary imprecision caused by the loss of voxel-level details during patch partitioning. To address these limitations, we propose NeuroMamba, a multi-perspective framework that exploits the linear complexity of Mamba to enable patch-free global modeling and synergizes this with complementary local feature modeling, thereby efficiently capturing long-range dependencies while meticulously preserving fine-grained voxel details. Specifically, we design a channel-gated Boundary Discriminative Feature Extractor (BDFE) to enhance local morphological cues. Complementing this, we introduce the Spatial Continuous Feature Extractor (SCFE), which integrates a resolution-aware scanning mechanism into the Visual Mamba architecture to adaptively model global dependencies across varying data resolutions. Finally, a cross-modulation mechanism synergistically fuses these multi-perspective features. Our method demonstrates state-of-the-art performance across four public EM datasets, validating its exceptional adaptability to both anisotropic and isotropic resolutions. The source code will be made publicly available.

Abstract PDF Upgrade to Chat

Summary

The paper introduces NeuroMamba, a framework that leverages multi-perspective feature interaction—including BDFE, SCFE, and CFI—to enhance neuron boundary delineation and capture both local and global features.
It employs a boundary discriminative feature extractor and spatial continuous feature extractor to address complex neuron morphology and long-range dependencies, achieving up to a 22.4% relative ARAND reduction on benchmark datasets.
Experimental results demonstrate NeuroMamba’s superior scalability and computational efficiency, outperforming conventional CNNs and Transformer-based methods while maintaining precise segmentation in dense EM data.

NeuroMamba: Multi-Perspective Feature Interaction with Visual Mamba for Neuron Segmentation

Introduction

High-throughput volumetric electron microscopy (EM) enables the reconstruction of neural circuits at synaptic and cellular resolution, a prerequisite for modern connectomics. Accurate neuron segmentation remains a formidable challenge due to the complex, elongated, and anisotropically organized morphology of neurons in dense neural tissue. Existing CNN-based segmentation pipelines capture local features but fail to model long-range dependencies, resulting in ambiguous boundary inference. Transformer-based approaches partially improve global context modeling through attention over image patches but fundamentally compromise fine-grained voxel-level detail, further suffering from sub-optimal adaptation to EM data anisotropy.

NeuroMamba is introduced as a framework to address these limitations, leveraging the linear-complexity long-range modeling capabilities of the Mamba state space architecture. By integrating patchless, resolution-aware global feature extraction with strong local morphological modeling and dynamic feature fusion, NeuroMamba establishes a new computational paradigm for 3D neuron segmentation.

Architectural Contributions

Multi-Perspective Feature Interaction (MPFI) Design

At the core of NeuroMamba is the MPFI block, which synergistically combines three critical components:

Boundary Discriminative Feature Extractor (BDFE): This module employs a channel-gated structure with 3D strip pooling aligned to the anisotropic and elongated geometry of neurons. The channel-wise convolutions and gating mechanism enable fine localization of boundary features, effectively suppressing regionally extraneous context that conventional square pooling would otherwise introduce.
Spatial Continuous Feature Extractor (SCFE): SCFE is built upon Visual Mamba, which processes entire volumetric blocks as 1D sequences for efficient, patch-free global dependency modeling. A crucial architectural advance here is the integration of a resolution-aware scanning mechanism, employing independent transverse-first and axial-first cross-scans with dynamically weighted fusion, tuned by the EM data's physical resolution prior. This enables adaptive balancing of transverse and axial features and directly addresses data anisotropy.
Cross Feature Interaction (CFI): Unlike simple additive or concatenative fusion, CFI employs a cross-modulation scheme whereby global and local representations reciprocally gate one another. This dynamic interaction enables selective enhancement of relevant features for both small-scale and large-scale neuron instances, providing robust representations for downstream affinity prediction.

Experimental Results

A comprehensive suite of experiments was conducted on four public EM datasets: AC3/AC4, CREMI, FIB25, and Kasthuri. Models were benchmarked using Variation of Information (VI) and adapted Rand error (ARAND) under two standard post-processing regimes (Waterz, Multicut).

Key results include:

NeuroMamba outperforms all CNN, Transformer, and prior Mamba-based baselines across all datasets. On CREMI-A, NeuroMamba achieves a relative ARAND reduction of 22.4% (Waterz) and 21.7% (Multicut) over the best prior model.
On the Kasthuri dataset, NeuroMamba exhibits strong scalability and accurate delineation of fine neurite processes, attesting to robust generalization in large-scale and real-world anatomical contexts.
Model efficiency is retained: NeuroMamba's parameter count and FLOPs are on par with lightweight CNNs but with superior accuracy and robustness, outperforming significantly larger Transformer and baseline Mamba models.

Ablation and Analytical Studies

Ablative experiments substantiate each introduced component:

Joint modeling via BDFE and SCFE is necessary for optimal performance; omitting either leads to substantial accuracy loss.
Cross-modulation in CFI consistently surpasses additive, multiplicative, or concatenative alternatives for feature fusion.
The resolution-aware scanning within SCFE provides measurable gains, especially in the presence of anisotropy.
Strip pooling (rather than square pooling) within BDFE is empirically validated as the optimal choice for neuron-shaped structures.
Varying block shapes and hyperparameters demonstrates model robustness; NeuroMamba maintains performance superiority across all tested configurations.
Comparative scanning mechanism analysis confirms the efficacy of the custom transverse-first and axial-first cross-scan strategies.

Theoretical and Practical Implications

NeuroMamba's approach redefines the handling of long-range dependencies in volumetric biological data. By avoiding patch partitioning (and thus the associated loss of local structure), yet preserving globally contextualized representations, the method bridges the CNN–Transformer divide. The integration of explicit data priors (anisotropy-aware scanning) into model architecture further exemplifies the virtue of domain knowledge in deep connectomics pipelines.

Practically, NeuroMamba enables more accurate, efficient, and scalable automated neuron segmentation from dense EM data. The substantial gains in ARAND and VI offer tangible benefits in subsequent analysis steps like connectome graph extraction and synapse detection, reducing the need for manual proofreading and correction.

Future Directions

The design principles validated in NeuroMamba—patch-free global modeling with efficient state space architectures, adaptive feature fusion, and data-aware architectural priors—may have broader translational impact. Potential future directions include:

Extension to other volumetric segmentation tasks within and beyond neuroscience (e.g., organelle segmentation, biomedical volume analysis).
Incorporation of unsupervised or self-supervised representation learning, exploiting the extensive unlabelled EM data available.
Further refinement of resolution- and morphology-aware mechanisms, potentially leveraging learned rather than fixed priors for adaptive spatial modeling.
Integration with GNNs for end-to-end neuron and connectome reconstruction pipelines.

Conclusion

NeuroMamba provides a comprehensive solution to the dual challenges of local feature preservation and global spatial continuity in neuron segmentation for volumetric EM data. Through the innovative integration of BDFE, a resolution-aware Mamba backbone (SCFE), and dynamic cross-modulatory fusion (CFI), it achieves state-of-the-art accuracy with high computational efficiency and robustness. Its design establishes a strong baseline for future segmentation algorithms in large-scale connectomics and other domains requiring precise, context-aware 3D instance parsing.

Reference: "NeuroMamba: Multi-Perspective Feature Interaction with Visual Mamba for Neuron Segmentation" (2601.15929)

Markdown Report Issue