MOON++: Multi-Organ Cohesion Network
- MOON++ is a multi-organ modeling framework that employs co-training and cross-organ attention to accurately segment organs in volumetric medical images.
- It leverages a DeepLab-based encoder–decoder and UniFormer backbones to extract high-context features, combining labeled and pseudo-labeled data using EMA teacher models.
- For diagnostic grading, the framework fuses volumetric imaging features with clinical priors, outperforming traditional single-organ approaches in segmentation and disease assessment.
Multi-Organ-COhesion Network++ (MOON++) is a technical framework for multi-organ modeling in medical images, with two distinct lines of work: (1) a model co-training method for multi-organ segmentation from few-organ datasets in volumetric medical imaging (Huang et al., 2020), and (2) a clinically informed, multimodal deep system for NCCT-based esophageal varices grading that fuses imaging, cross-organ reasoning, and knowledge priors (Zhang et al., 22 Dec 2025). MOON++ innovations enable high-fidelity integration of partial or complementary organ information, maximize segmentation and diagnostic performance, and minimize inference cost compared to prior single-organ or naive multi-organ approaches.
1. Core Architectures and Methodological Foundations
The fundamental MOON++ segmentation framework employs a DeepLab-style encoder–decoder with a dilated ResNet-50 backbone enhanced by IBN (instance–batch normalization) to improve semantic segmentation accuracy and robustness to domain shift (Huang et al., 2020). Multi-organ clinical diagnosis employs organ-specific UniFormer-Base backbones—hybrid convolution–attention transformers pretrained on Kinetics-400—to extract highly contextualized representations from region-of-interest (ROI) patches of the esophagus, liver, and spleen (Zhang et al., 22 Dec 2025).
For segmentation, the training involves two identical but independently initialized neural networks, F(·∣θ₁) and F(·∣θ₂), each paired with a temporally averaged “teacher” copy via exponential moving average (EMA) updates. In diagnosis, after nnU-Net-based segmentation, each organ’s ROI is mapped to a canonical grid and encoded by its own deep feature backbone, whose outputs are fused in downstream modules.
2. Data Cohesion, Co-Training, and Organ Interaction
Segmentation via Co-Training Weight-Averaged Models
MOON++ addresses the problem of disparate few-organ training datasets by using a collaborative co-training protocol: each network learns the available hard label for its active organ channel and receives soft labels for un-annotated organ channels from its sibling’s EMA teacher. Specifically, the loss function involves:
- Supervised loss (focal and soft-dice) on annotated pixels only.
- Consistency loss over un-annotated regions, where the soft prediction of the opposing network’s EMA teacher is treated as the pseudo-label, modulated by a spatial region mask to exclude pixels in the known organ area.
- The region mask is defined such that for pixels inside the organ- region, 1 elsewhere, and dilated to avoid constraint imposition near ground-truth edges.
These strategies allow knowledge sharing between networks for organs lacking annotation in each instance while preventing over-regularization in supervised regions. Confidence weighting for class imbalance is achieved by scaling focal loss with .
Cross-Organ Attention and Multimodal Fusion for EV Grading
In esophageal varices grading (Zhang et al., 22 Dec 2025), MOON++ fuses the imaging features of segmented esophagus, liver, and spleen via the Organ Representation Interaction (ORI) module. In each of steps, cross-attention is repeatedly computed from liver and spleen to esophagus representations, enabling feature-level modulation reflecting clinical dependencies between these organs. Concatenation and subsequent pooling yield a joint representation.
Volumetric priors (absolute and ratio-scaled organ volumes) and liver-to-spleen volume ratio (LSVR) are categorized, encoded with a pre-trained medical CLIP (M3D) text encoder, and projected for joint embedding with image features. This hybridization incorporates both image-derived and clinically inspired cross-organ dependencies.
3. Losses, Training Protocols, and Optimization
Segmentation Losses and Schedule
The multi-organ segmentation protocol employs a compound loss: with , , , transitioning from 0 to 1 during early training. EMA is maintained with . Each batch samples randomly from all available single-organ datasets.
The superior final model is selected as the best of the two EMA teachers on a held-out validation set; at inference, only a single network is deployed, yielding computational efficiency equivalent to a single DeepLab pass.
Grading Loss, Ordinality and Deep CCA
The MOON++ EV grading protocol applies a hybrid loss: is the squared error between the model’s logits and the ordinally encoded ground truth for grades G0–G3. Deep CCA regularization enforces compatibility in feature space between the main esophagus branch and the auxiliary liver/spleen branches.
4. Clinical Prior Integration and Knowledge-Guided Embedding
In the diagnostic MOON++ application, volumetric priors—esophagus, liver, and spleen volumes and LSVR—are discretized into categories and included as a prompt to a pre-trained medical CLIP encoder. The resulting text and 3D image embeddings are aligned and fused with organ-ROI features, augmenting the model with explicit clinical context.
The joint embedding employs a simple feedforward combination: This fusion module enables the model to reason over both observed image appearance and latent, knowledge-based risk factors relevant to EV pathology.
5. Experimental Protocols, Cohorts, and Metrics
Segmentation experiments use public (LiTS, KiTS, Pancreas) and custom (MOBA) single-organ datasets (Huang et al., 2020), with batch size 24 (3 images/GPU on 8 GPUs), cosine-decayed learning rate (initial 0.05), weight decay , SGD optimizer, and 10 epochs.
Diagnostic grading is evaluated on 1,631 NCCT scans for training, 239 validation, and 289 test cases, with balanced grade splits: G0–G3 (Zhang et al., 22 Dec 2025). Each organ-specific ROI is extracted, resampled (, , for esophagus, liver, spleen), and processed via UniFormer backbones. Training uses Adam optimizer, learning rate , batch size 8, and 100 epochs.
Performance is quantified via Dice similarity coefficient (DSC), average Hausdorff distance (HD) for segmentation, and AUC, multi-class accuracy, and Kendall’s Tau for grading.
6. Results and Analytical Summary
Segmentation
MOON++ achieves state-of-the-art multi-organ segmentation performance with minimum inference overhead:
| Dataset | Method | DSC (%) | HD (mm) | s/test |
|---|---|---|---|---|
| LiTS + KiTS + Pancreas | Individual single-organ | 89.41 | — | — |
| LiTS + KiTS + Pancreas | MOON++ (CT+WA+RM+IBN) | 90.22 | — | 4.28 |
| MOBA (8-organ) | Individual single-organ | 82.41 | 35.82 | — |
| MOBA (8-organ) | MOON++ | 83.60 | 31.88 | — |
| MOBA (8-organ) | conditionCNN | 62.37 | — | 12.9 |
MOON++ consistently outperforms single-organ and prior multi-organ baselines (conditionCNN), and naive self-training, matching or exceeding fully supervised multi-organ baselines while retaining a single-model deployment without computational overhead (Huang et al., 2020).
Esophageal Varices Grading
On independent test evaluation, MOON++ substantially outperforms single-organ and conventional multi-organ models:
| Task (Test) | Single-Organ Esophagus | MOON++ Multi-Organ |
|---|---|---|
| AUC (G3 vs <G3) | 0.803 | 0.894 |
| AUC (≥G2 vs <G2) | 0.793 | 0.921 |
| Multi-class Accuracy | 53.3% | 65.3% |
| Kendall's Tau | 60.1% | 74.4% |
Reader study results show MOON++ surpasses board-certified radiologists in multi-class accuracy and AUC for all tasks, with greater accuracy in both low- and high-grade distinctions. Permutation testing confirms significance for ≥G1 and ≥G2 improvements () (Zhang et al., 22 Dec 2025).
7. Contextual Impact and Comparative Assessment
MOON++ frameworks establish new baselines for collaborative learning across organ domains and for multimodal, knowledge-aware disease grading, surpassing prior methods in segmentation and non-contrast CT analysis. These systems demonstrate that cross-organ interaction, guided by either co-training or explicit relation modules (ORI), yields superior generalization, particularly when annotated data are fragmented across organs or when clinical task performance depends on multi-organ contextual reasoning.
A plausible implication is that co-training on few-organ datasets and explicit integration of clinical prior knowledge can mitigate annotation scarcity and domain heterogeneity, supporting broader adoption in multi-organ imaging phenotyping and diagnostic risk stratification (Huang et al., 2020, Zhang et al., 22 Dec 2025).