Federated Learning for Brain Tumor Localization

Updated 28 January 2026

The paper introduces a novel federated learning framework that integrates 3D U-Net architectures with modality fusion to accurately localize brain tumors in MRI scans.
It employs privacy-preserving protocols, enhanced aggregation methods like FedAvg and FedBN, and secure differential privacy to protect sensitive patient data.
The framework demonstrates competitive Dice scores and robust performance in handling heterogeneous data and missing modalities, validating its clinical viability.

A federated learning framework for brain tumor localization refers to a distributed machine learning paradigm in which multiple clinical institutions collaboratively train a model to automatically determine the spatial extent or bounding box/voxel mask of brain tumors in MRI images, without centralizing patient data. The goal is to exploit multi-institutional data heterogeneity (scanners, populations, acquisition protocols, available modalities) while preserving privacy and meeting regulatory constraints. Such frameworks integrate advances in model architecture, privacy-preserving distributed optimization, domain adaptation to modality and site heterogeneity, and robust evaluation schemes to achieve high tumor localization/segmentation performance in realistic clinical deployments.

1. Model Architectures Enabling Federated Tumor Localization

The dominant architecture for brain tumor localization in federated learning is the 3D U-Net family, often modified to address MRI modality and site heterogeneity (Pati et al., 2021, Wagner et al., 2024). Key architectural elements include:

3D Residual U-Net with Modality Union Inputs: Each model instance contains input channels for the union of all modalities across sites (e.g., T1, T2, FLAIR, T1c/Gd). At each client, missing modalities are zero-padded, maintaining a consistent channel dimensionality (Wagner et al., 2024).
Encoder Path: Four-level hierarchical encoding via Conv3D layers (filters double at each level) with residual blocks and feature normalization (BatchNorm, InstanceNorm, GroupNorm, or normalization-free layers; the best variant can depend on the scenario) (Wagner et al., 2024, Stoklasa et al., 2023).
Decoder Path: Four upsampling levels using transposed convolutions, skip-connections to corresponding encoder stages, and channel reductions to reconstruct voxelwise probability maps for tumors.
Alternative Detection Heads: Tumor localization as bounding-box regression is supported by replacing the segmentation head with a 3D detection head (e.g. 3D YOLO, Retina U-Net, or Faster R-CNN). Losses are adapted accordingly (SmoothL1 for regression, cross-entropy for presence detection) (Pati et al., 2021, Monisha et al., 6 Mar 2025).
Hybrid Backbones: Enhanced frameworks employ Vision Transformer bottlenecks for improved global context or supervoxel-level GNNs for spatial/geometric agglomeration (Wakili et al., 19 Dec 2025, Protani et al., 21 Jan 2026).
Modality Fusion and Personalization: Federated frameworks may instantiate separate encoders per modality (modality-specific encoders), with server-side multi-modal fusion decoders and client-side personalized decoders/adapters (Dai et al., 2024, Wakili et al., 19 Dec 2025). Cross-modal attention and anchor-based fusion (e.g., LACCA in FedMEMA) enable inference with incomplete modality sets.

2. Federated Aggregation Protocols and Privacy-Preserving Schemes

Federated optimization is implemented chiefly via Federated Averaging (FedAvg), with substantial enhancements to address real-world clinical heterogeneity and privacy (Pati et al., 2021, Isik-Polat et al., 2022, Stoklasa et al., 2023, Wagner et al., 2024):

FedAvg: Each client receives the global model, performs local updates (typically stochastic optimization for multiple epochs), and returns model weights to the server. The server aggregates updates via a weighted sum, normalized by client sample size.
Personalized/Statistically Robust Variants:
- FedBN: BatchNorm or similar statistics are kept client-specific; only convolutional weights are aggregated, enhancing performance for clients with distinct intensity/distribution statistics (Wagner et al., 2024, Fiszer et al., 8 Oct 2025).
- FedNova/FedAvgM: Normalized update aggregation and server-side momentum improve convergence under strong data heterogeneity (Isik-Polat et al., 2022).
- Decentralized/Gossip Mutual Learning: Peer-to-peer model exchange without a central server via gossip protocols and mutual regionalized loss functions reduces communication and mitigates single-point-of-failure, with convergence guarantees under random peer graphs (Chen et al., 2024).
- Asynchronous/Decentralized Lifelong Learning (ADFLL): Asynchronous agent-hub communication with distributed memory buffers achieves low distance errors for tumor landmark localization without central coordination (Zheng et al., 2023).
Differential Privacy and Secure Aggregation: Server- or client-side aggregation protocols inject noise calibrated for (ε, δ)-DP, bound gradients per client, and optionally implement secure aggregation with encrypted shares to protect model updates (Khan et al., 2023, Monisha et al., 6 Mar 2025, Zhou et al., 2024).
Handling of Stragglers and Communication Bottlenecks: Strategies include partial client participation, adaptive local epochs, model update compression (quantization, sparsification), and communication efficient aggregators (Pati et al., 2021, Isik-Polat et al., 2022).

3. Loss Functions and Training Strategies Specific to Localization and Segmentation

The loss function is tailored to the target localization regime:

Combined Dice and Cross-Entropy Loss: Voxelwise tumor mask segmentation uses a convex combination, typically with higher weight assigned to Dice loss to enforce overlap (Wagner et al., 2024):

$\mathcal{L}_c(\theta) = \alpha \mathcal{L}_{Dice} + (1-\alpha)\mathcal{L}_{BCE}, \quad \alpha=0.8$

$\mathcal{L}_{Dice} = 1 - \frac{2 \sum_i p_i y_i + \epsilon}{\sum_i p_i + \sum_i y_i + \epsilon}$
$\mathcal{L}_{BCE}$ $L_{BCE}$ standard per-voxel binary cross-entropy.
- Localization-specific Losses: For direct bounding-box prediction, 3D detection models use composite loss functions:

$L = \lambda_{loc} L_{loc} + \lambda_{obj} L_{obj} + \lambda_{cls} L_{cls}$

with localization loss (SmoothL1), objectness (presence/absence), and multi-class/multi-label prediction (Monisha et al., 6 Mar 2025).

Personalization and Domain-Shift Mitigation: Some frameworks (e.g. TwinSegNet, clustered personalization via CFFT, FedBN) decouple local and global objectives by fine-tuning global weights on private client data or clustering texture-based representations for cluster-wise fine-tuning (Wakili et al., 19 Dec 2025, Manthe et al., 2023).
Random Modality Drop (Modality Dropout): To increase robustness to missing modalities and prevent co-occurrence artifacts, input modalities are randomly zeroed per sample with probability $\phi$ (e.g., $\phi=0.3-0.5$ ), which regularizes feature reliance and improves generalization to unseen MRI protocols (Wagner et al., 2024).

4. Addressing Heterogeneity in Data, Modalities, and Institution

Federated frameworks explicitly address multiple sources of heterogeneity:

Inter- and Intra-Institutional Variability: Clustered federated personalization assigns cases to radiomics-based clusters (across and within institutions) and re-runs FL on cluster-specific data partitions to adapt to scanner/protocol bias (Manthe et al., 2023).
Modality Heterogeneity: Modality union input strategies and per-modality encoders enable single-model inference across arbitrary modality subsets, with zero-padding for absent modalities (Wagner et al., 2024, Dai et al., 2024). FedMEMA augments this via anchor-based cross-attention to calibrate missing-modal information (Dai et al., 2024).
Preprocessing and Normalization Diversity: GroupNorm, InstanceNorm, or client-specific BatchNorm (FedBN) yield models robust to normalization heterogeneity, enabling Dice scores of 0.92 on 3D tumor masks under widely inconsistent data normalization—a result comparable with centralized training (Fiszer et al., 8 Oct 2025).
Domain Adaptation and Augmentation: Intensity, bias-field, and geometric augmentation, as well as clustering/attention-based alignment and personalized adaptation layers, are used both during training and for deployment on out-of-distribution data (Pati et al., 2021, Dai et al., 2024).

5. Quantitative Results and Performance Benchmarks

Federated localization and segmentation frameworks consistently achieve Dice and localization metrics competitive with centralized models across diverse real-world settings:

Framework	Dice (Segmentation)	Key Setting	Notes
FeTS/FedUniBrain (Wagner et al., 2024)	91.8% (FedAvg; seen clients)	3D ResUNet, p-modalities, BN/IN/GN/NF, modality dropout	Zero-shot: 72.1% (NF+drop; new modalities)
Screening Tool (Stoklasa et al., 2023)	0.84 (EaD), 0.89 (GD)	2D U-Net, InceptionV3, 102 test MRI exams	Outperforms site-only baselines
TwinSegNet (Wakili et al., 19 Dec 2025)	0.90	Hybrid ViT-U-Net, 9 clients, non-IID distributions	Digital twin: 0.897 Dice
FedMEMA (Dai et al., 2024)	62.0% (client mono-modal)	Modality-specific encoder, LACCA, BraTS2020	Fusion server mDSC: 82.1%
Fed-MUnet (Zhou et al., 2024)	0.890 (mean, ET/TC/WT)	2D U-Net + CMM, DP-FedAvg, BraTS2022	Comparable to nnU-Net/TransBTS/UNETR
FedBN vs Centralized (Fiszer et al., 8 Oct 2025)	0.92	2D U-Net, 6 clients, intensity normalization heterogeneity	Statistically indistinguishable
DP-SimAgg (Khan et al., 2023)	0.701 (Dice, ET, ε=1.0)	3D U-Net, SimAgg + Gaussian DP noise, FeTS2022	Utility degrades gracefully at strong DP
GML (Chen et al., 2024)	0.9104 (site aggregate)	3D SANet, gossip peer-to-peer FL, BraTS2021, non-IID	FedAvg: 0.9095, Centralized: 0.9203
YOLOv11-FL (Monisha et al., 6 Mar 2025)	0.87 (F1)	5 clients, cross-modality attention, DP, 2D anchor detector	Centralized F1: 0.85; [email protected] up to 0.91
FedDis (Bercea et al., 2021)	0.417 (unsup, glioblastoma)	Shape-appearance disentanglement, healthy-only federated train	+43% over vanilla AE, +16% over SiloBN
ADFLL (Zheng et al., 2023)	7.81mm (mean distance error)	4 decentralized agents, DQN, experience-replay hubs, BRATS	Lower than all-knowing RL baseline

A common finding is that federated models reduce performance variance across sites, close the gap between local and centralized training, and in multiple settings outperform single-site models on both segmentation and localization tasks (Wagner et al., 2024, Wakili et al., 19 Dec 2025, Stoklasa et al., 2023). Advanced methods—personalization, cross-modal fusion, robust aggregation, and domain/normalization adaptation—further mitigate the effects of non-IID data distributions and missing modalities.

6. Practical Deployment, Recommendations, and Limitations

Recommendations for real-world multi-hospital deployment:

Model Design: Utilize a zero-padded modality-union input model (for all expected MRI modalities), strong feature normalization, and robust modality-drop training to ensure resilience against missing scans (Wagner et al., 2024).
Training Protocol: Set local epochs/steps so each institution processes a comparable number of data samples per round; monitor per-site metrics for divergence (Wagner et al., 2024, Pati et al., 2021).
Norm Selection: Use FedBN personalization if post-federation fine-tuning on local data is expected; otherwise, normalization-free layers improve out-of-domain generalization (Wagner et al., 2024, Fiszer et al., 8 Oct 2025).
Privacy: For maximum protection, combine secure aggregation with differential privacy—either at server-side (global DP noise injection) or per-client (local update clipping + noise) (Khan et al., 2023, Monisha et al., 6 Mar 2025, Zhou et al., 2024).
Communication: 3D models are resource-intensive; gradient or weight compression, client sampling, and decentralized protocols can reduce bandwidth requirements (Chen et al., 2024, Isik-Polat et al., 2022).
Interpretability: Integration of attention-based explainability reveals which modalities drive localization, supporting clinical plausibility (Protani et al., 21 Jan 2026).
Evaluation: Explicit reporting of Dice, Hausdorff, sensitivity/specificity, and AP/mAR for localization is necessary for clinical readiness.

Limitations:

Generalization to rare/unseen modality combinations and scanner protocols, especially for clients not closely represented in the training distribution, is not theoretically guaranteed and may fail (Wagner et al., 2024).
Communication and compute costs are significant for volumetric models. Approaches such as gossip FL or model compression alleviate but do not eliminate this concern (Chen et al., 2024).
Absence of explicit boundary/contour error (e.g. Hausdorff) in some studies; these must be independently validated for surgical/resection applications (Wagner et al., 2024).
Non-volumetric (2D slicewise) models may underutilize spatial information; adoption of efficient, privacy-preserving 3D models is ongoing (Zhou et al., 2024).

7. Outlook and Future Directions

Federated learning continues to advance towards robust, scalable, privacy-compliant brain tumor localization in distributed clinical environments. Several research directions emerge:

Personalization: Clustered and site-adaptive FL, digital twin fine-tuning, and client-specific decoders are improving local performance and handling strong inter-site heterogeneity (Manthe et al., 2023, Wakili et al., 19 Dec 2025, Dai et al., 2024).
Advanced Security and Privacy: Hybrid secure aggregation, local+global differential privacy, and cryptographic protocols (e.g., secure enclaves, homomorphic encryption) are being incorporated for regulatory readiness (Monisha et al., 6 Mar 2025, Zhou et al., 2024).
Advanced Architectures: Transformers, GNNs, multi-modal anchors, and modality-adaptive attention promise higher sample/computation efficiency and interpretability (Wakili et al., 19 Dec 2025, Protani et al., 21 Jan 2026, Dai et al., 2024).
Handling Incomplete/Noisy Labels: Semi-supervised, anomaly-based (e.g., FedDis), and lifelong multi-task FL frameworks are under development to address data incompleteness and label scarcity (Bercea et al., 2021, Zheng et al., 2023).
Cross-task Transfer: Methodologies for federated joint detection, segmentation, and downstream clinical outcome prediction are being piloted, with early evidence for cross-task synergistic learning (Pati et al., 2021).

Federated learning is now a feasible and practical solution for multi-institutional brain tumor localization, provided frameworks are architected for heterogeneity and privacy and are evaluated under rigorous, clinically relevant metrics (Wagner et al., 2024, Protani et al., 21 Jan 2026, Wakili et al., 19 Dec 2025).