A-QCF-Net: Adaptive Quaternion Cross-Fusion

Updated 1 January 2026

The paper introduces A-QCF-Net, achieving a four-fold parameter reduction using quaternion convolutions for expressive feature extraction.
A-QCF-Net employs an Adaptive Quaternion Cross-Fusion block to dynamically fuse unpaired CT and MRI features, enhancing segmentation performance.
Empirical results show significant improvements over unimodal baselines, demonstrating robust generalization and clinical relevance.

The Adaptive Quaternion Cross-Fusion Network (A-QCF-Net) is a unified deep learning architecture for multimodal medical image segmentation, specifically designed to address the segmentation of liver tumors in CT and MRI volumes from completely unpaired datasets. A-QCF-Net leverages Quaternion Neural Networks for parameter-efficient, expressive feature extraction, and introduces the Adaptive Quaternion Cross-Fusion (A-QCF) block to facilitate bidirectional and scale-wise information exchange between modalities. This enables the network to learn shared and modality-specific priors, allowing effective segmentation even in the absence of paired or spatially aligned imaging data (V et al., 25 Dec 2025).

1. Quaternion Neural Network Foundations

A-QCF-Net is built upon the mathematical framework of quaternions, which encode a four-dimensional extension of complex numbers. A quaternion is represented as

$q = a + b\,\mathbf{i} + c\,\mathbf{j} + d\,\mathbf{k} \quad (a,b,c,d \in \mathbb{R}),$

where $\mathbf{i}$ , $\mathbf{j}$ , and $\mathbf{k}$ are imaginary units. The key operations include the Hamilton (quaternion) product, conjugation, and norm. The convolutions in quaternion neural networks exploit the Hamilton product, allowing intertwining of the four subchannels—real and three imaginary components—which biases the network to capture holistic multi-dimensional correlations rather than independent patterns.

Parameter tying inherent to the Hamilton product reduces redundancy in the network architecture. A real-valued 3D convolution with input/output channels $4C_{\rm in} \rightarrow 4C_{\rm out}$ and kernel size $k^3$ requires $16\,C_{\rm in}C_{\rm out} k^3$ parameters. In contrast, the equivalent quaternion convolution requires only

$\theta_{\mathbb{H}} = 4\,C_{\rm in}C_{\rm out}k^3 = \tfrac{1}{4}\theta_{\mathrm{real}},$

which provides a four-fold reduction in parameters. This mitigates overfitting and enhances learning from heterogeneous and unpaired medical datasets (V et al., 25 Dec 2025).

2. Overall A-QCF-Net Architecture

A-QCF-Net employs a dual-stream encoder-decoder architecture, optimized for simultaneous processing of unpaired CT and MRI input volumes. The pipeline proceeds as follows:

Unpaired CT and MRI 3D patches are independently passed through parallel encoder branches composed of stacked quaternion convolutional blocks.
At each encoder down-sampling level, an Adaptive Quaternion Cross-Fusion (A-QCF) block performs attentive feature fusion between the modalities.
Features from both encoders merge into a shared bottleneck quaternion representation—enforcing a common, modality-agnostic semantic space.
Two symmetric decoders (one per modality) reconstruct segmentation masks via up-sampling, quaternion convolution, and skip-connections augmented by attention gating.
A final real-valued 1%%%%6 $\mathbf{j}$ 7%%%%1 convolution projects quaternion features to three-class segmentations for each modality.

Summary of channel widths at each encoder/decoder stage:

Stage	Encoder Channels	Decoder Channels	Bottleneck
Input	12
Down/up-sample	24, 48, 96, 192, 256	192, 96, 48, 24, 12	256

Skip-connection fusion in the decoder utilizes modality-specific attention gates for spatial focus (V et al., 25 Dec 2025).

3. Adaptive Quaternion Cross-Fusion (A-QCF) Block

The A-QCF block is the core mechanism enabling dynamic, bidirectional knowledge transfer between CT and MRI feature streams at each encoder level. It operates as follows:

Quaternion Projections: 1%%%%8 $\mathbf{j}$ 9%%%%1 quaternion convolutions produce query/key/value features from each stream:

$\mathbf{i}$ 2
Channel-wise Attention: Attention weights are computed via the softmax of the channel-wise dot product between $\mathbf{i}$ 3 and $\mathbf{i}$ 4.
Cross-context Vector: Aggregated cross-modal context is obtained by weighting $\mathbf{i}$ 5 with the computed attention.
Adaptive Gating: Global average pooling on input and cross-context features, followed by MLP and sigmoid activation, yields the adaptive gate $\mathbf{i}$ 6.
Gated Fusion: The output for CT (similarly for MRI) is obtained via concatenation, gating, and output quaternion convolution:

$\mathbf{i}$ 7

The design guarantees a valid unimodal pass if the other stream is absent. Fusion regularizes feature representations, encouraging each stream to incorporate complementary modality priors (e.g., sharp CT boundaries, MRI soft tissue contrast) without paired input requirements. The block ensures network stability via a learned Lipschitz bound (V et al., 25 Dec 2025).

4. Training Strategy with Unpaired Medical Datasets

Training of A-QCF-Net occurs on wholly unpaired CT and MRI datasets: LiTS for CT and ATLAS for MRI. No patient overlap exists between the cohorts. Key procedural details:

Data Pipeline: Five-fold cross-validation stratified by tumor presence. CT intensities are windowed to $\mathbf{i}$ 8; MRI to $\mathbf{i}$ 9. All patches are resampled to 1 mm $\mathbf{j}$ 0, with cropping, padding, and random spatial/intensity augmentations applied. Each training batch includes one patch from CT and one from MRI.
Loss Function: Each stream minimizes the sum of Dice and cross-entropy loss:

$\mathbf{j}$ 1

$\mathbf{j}$ 2
Optimization: AdamW optimizer ( $\mathbf{j}$ 3, $\mathbf{j}$ 4, $\mathbf{j}$ 5, $\mathbf{j}$ 6, weight decay $\mathbf{j}$ 7). ReduceLROnPlateau is employed based on validation Dice, and networks are trained for 200 epochs on large 3D patches (256 $\mathbf{j}$ 8256 $\mathbf{j}$ 916, NVIDIA A100 GPU) (V et al., 25 Dec 2025).

5. Empirical Results and Comparative Evaluation

A-QCF-Net demonstrates clear empirical gains over state-of-the-art unimodal baselines (nnU-Net) on segmentation of liver tumors in both CT and MRI, as summarized below.

Metric/Modality	A-QCF-Net	nnU-Net (unimodal)	Margin
Tumor Dice (CT)	76.7 ± 0.8	71.3 ± 0.8	+5.4%
Tumor Dice (MRI)	78.3 ± 0.6	73.6 ± 0.9	+4.7%
Tumor MSD (CT, mm)	3.85 ± 0.75	4.55 ± 0.92	—
Tumor MSD (MRI, mm)	3.20 ± 0.65	4.25 ± 0.90	—

Statistically significant improvements are observed according to Wilcoxon tests, Bonferroni-corrected (e.g., CT Tumor DSC $\mathbf{k}$ 0). Robust generalization is reported on out-of-distribution datasets: 3DIRCADb (CT) Tumor Dice 69.4 ± 1.5%, LiverHccSeg (MRI) Tumor Dice 85.9 ± 1.1% (V et al., 25 Dec 2025).

6. Model Explainability and Clinical Utility

Comprehensive analysis with Grad-CAM and Grad-CAM++ is performed to confirm model focus on meaningful anatomical regions.

Saliency Maps: “Tumor” class attention maps are sharply localized to true lesions in both CT and MRI.
Quantitative Alignment: Median saliency IoU of 0.72 ± 0.08. Boundary band (3 mm) coverage of 81% ± 6%. Pointing-game accuracy of 94% ± 4%.
Clinical Evaluation: Dual-radiologist study (N=40) yields high Likert scores (mean 9.1 and 8.7), clinical acceptability (92.5% and 90.0%), with inter-rater ICC of 0.88 and Kappa of 0.83.

These results confirm that the learned representations correspond to clinically relevant features and that segmentation outputs are broadly acceptable for downstream medical use (V et al., 25 Dec 2025).

7. Conclusions and Prospective Extensions

A-QCF-Net constitutes the first unified quaternion-based network capable of joint training on completely unpaired CT and MRI cohorts for organ and tumor segmentation. The Adaptive Quaternion Cross-Fusion block enables robust, bidirectional exchange of modality-specific expertise, promoting superior generalization and clinical acceptability relative to unimodal and naïve unpaired baselines.

Potential future research avenues include:

Extending the dual-stream design to accommodate $\mathbf{k}$ 1 modalities, such as PET or multiparametric MRI.
Hybrid training schemes: pre-training on large unpaired archives, followed by fine-tuning on limited paired data.
Multi-center, prospective clinical validation to assess impact on operational workflows and clinical decision support at scale (V et al., 25 Dec 2025).

Markdown Report Issue Upgrade to Chat

References (1)

A-QCF-Net: An Adaptive Quaternion Cross-Fusion Network for Multimodal Liver Tumor Segmentation from Unpaired Datasets (2025)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Adaptive Quaternion Cross-Fusion Network (A-QCF-Net).