Papers
Topics
Authors
Recent
Search
2000 character limit reached

WoundNeRF: Neural 3D Wound Segmentation

Updated 30 January 2026
  • WoundNeRF is an SDF-based neural field system that integrates multi-view RGB images to produce a continuous, multi-view consistent 3D wound segmentation.
  • It leverages dual MLPs—one for geometry and one for appearance—to convert 3D coordinates and view directions into accurate volumetric reconstructions.
  • Robust training combines fine-tuning on noisy 2D annotations with volumetric and semantic losses, delivering superior segmentation accuracy over traditional methods.

WoundNeRF is a signed distance function (SDF)-based neural field system for multi-view consistent 3D wound segmentation from standard RGB images. Developed within the NeRF + SDF framework and augmented by a semantic decoder, WoundNeRF aggregates automatically generated 2D wound annotations into a unified 3D segmentation, addressing longstanding challenges in acquiring robust, view-consistent representations of wound-bed tissue from sparse and noisy clinical imagery (Chierchia et al., 23 Jan 2026).

1. Underlying Framework and Model Architecture

WoundNeRF operates within the Neural Radiance Fields (NeRF) + Signed Distance Function (SDF) paradigm, closely following the architecture of NeuS [wang2021neus], but with additional semantic segmentation capability. The backbone comprises two multilayer perceptrons (MLPs):

  • Geometry MLP: Receives 3D coordinates xR3x\in\mathbb{R}^3; outputs signed distance sθ(x)s_\theta(x) to the nearest surface and a latent feature vector g(x)g(x). The SDF is translated to a volumetric density via σ(x)=αReLU(sθ(x))\sigma(x) = \alpha\cdot\mathrm{ReLU}(-s_\theta(x)), where α\alpha is a learned scale.
  • Appearance MLP: Consumes a 3D position xx and view direction dd; outputs an RGB value cθ(x,d)[0,1]3c_\theta(x, d)\in[0,1]^3.

Color rendering along camera rays r(t)=o+tdr(t) = o + td is performed via volumetric integration: C(r)=0T(t)σ(r(t))  cθ(r(t),d)dt,C(r) = \int_{0}^{\infty} T(t)\,\sigma\left(r(t)\right)\;c_\theta\left(r(t),d\right)\,dt, with accumulated transmittance

T(t)=exp(0tσ(r(u))du).T(t) = \exp\left(-\int_{0}^{t} \sigma\left(r(u)\right) du\right).

Through this mechanism, every input image contributes not just pixel information, but constraints on the underlying 3D semantic and geometric field.

2. Semantic Segmentation Extraction in 3D and 2D

The geometry MLP is extended with a semantic head mapping latent features g(x)g(x) to per-class logits sθi(x)s_{\theta}^{i}(x) across six categories: background and five wound-bed tissue classes (granulation, slough, necrotic, epithelia, unknown).

To facilitate wound-bed prediction, the five tissue class logits are aggregated using a log-sum-exp operation: sθw(x)=log(i=15exp(sθi(x)));sθb(x)=background logit.s_\theta^{\mathbf{w}}(x) = \log\left(\sum_{i=1}^{5} \exp\left(s_\theta^{i}(x)\right)\right);\qquad s_\theta^{\mathbf{b}}(x) = \text{background logit}. Class probabilities are produced via softmax at each location.

The model yields pixel-wise segmentations by volume-rendering semantic probabilities along each ray: S^c(r)=0T(t)σ(r(t))sθc(r(t))dt.\hat{S}^c(r) = \int_0^\infty T(t)\,\sigma(r(t))\,s_\theta^c(r(t))\,dt. 3D wound masks are extracted by thresholding the SDF, sθ(x)<0s_\theta(x)<0, or by applying a probabilistic occupancy threshold: P(woundx)=σ(sθ(x)).P(\text{wound}\mid x) = \sigma(-s_\theta(x)). This dual 2D/3D representation enables straightforward projection into acquired viewpoints and robust spatial integration.

3. Data Annotation and Supervision Protocols

Annotation begins with fine-tuning a SegFormer model on a small expert-annotated subset (1–4 views per patient), generating “noisy” 2D masks for all \sim50 frames per wound video. These serve only as supervisory signals: the network learns a single 3D field whose projections must align with all available 2D masks.

Two objectives drive the optimization:

  • Volumetric RGB Reconstruction:

Lrecon=ErC(r)Cobs(r)22,\mathcal{L}_{\text{recon}} = \mathbb{E}_r\|C(r) - C_{\text{obs}}(r)\|_2^2,

with Cobs(r)C_{\text{obs}}(r) the true observed pixel color.

  • Segmentation Consistency:

Lseg=Er[wyrlogS^yr(r)],\mathcal{L}_{\text{seg}} = \mathbb{E}_r[-w_{y_r}\log \hat{S}^{y_r}(r)],

a weighted cross-entropy on rendered per-ray probabilities, mitigating class imbalance through class-weights wyw_y.

This protocol ensures that spatial consistency is enforced by grounding all 2D supervisory information within a single, continuous 3D representation.

4. Optimization, Regularization, and Training Regime

The overall training objective is

Ltotal=λ1Lrecon+λ2Lseg+λ3Lreg,\mathcal{L}_{\text{total}} = \lambda_1\,\mathcal{L}_{\text{recon}} + \lambda_2\,\mathcal{L}_{\text{seg}} + \lambda_3\,\mathcal{L}_{\text{reg}},

where Lreg\mathcal{L}_{\text{reg}} is an Eikonal regularizer: Lreg=Ex(sθ(x)1)2.\mathcal{L}_{\text{reg}} = \mathbb{E}_x(\|\nabla s_\theta(x)\| - 1)^2. Training proceeds in two stages:

  • Geometry MLP alone (Lrecon+Lreg\mathcal{L}_{\text{recon}} + \mathcal{L}_{\text{reg}}) for 50,000 iterations.
  • Semantic head attached, full objective for an additional 50,000 iterations. Adam optimizer is used with lr=5×104lr=5\times10^{-4}, β1=0.9\beta_1=0.9, β2=0.999\beta_2=0.999, batch size 1024 rays.

5. Experimental Dataset, Preprocessing, and Evaluation

WoundNeRF was validated on 73 wound-videos from 35 patients, each yielding approximately 50 frames with known camera poses and a 3D mesh reconstructed using structure-from-motion. For ground truth, 1–4 frames per wound were manually annotated.

Preprocessing steps include undistortion, croppings centered on the wound, color correction, and pose refinement using COLMAP. Evaluation employs two complementary strategies:

  • 3D Dice & Recall: Voxelization of predicted wound region, compared to a pseudo-ground-truth mesh built from expert masks.
  • 2D Back-projection: Rendering predicted 3D masks back into GT-annotated camera views, enabling computation of 2D Dice (DSC) and Recall.

6. Quantitative Performance and Robustness Analysis

The performance of WoundNeRF is presented alongside two baselines: SegFormer (2D) and a heuristic fusion of 2D masks on a mesh (3D/2D). Metrics are reported for wound-bed DSC, granulation DSC, and slough DSC, with corresponding Recall.

Method Wound bed DSC Recall Granulation DSC Recall Slough DSC Recall
2D (SegFormer) 0.851 0.819 0.738 0.689 0.670 0.609
3D/2D 0.855 0.840 0.761 0.719 0.682 0.614
Ours (w/o DO) 0.851 0.859 0.767 0.764 0.691 0.658
Ours 0.857 0.893 0.775 0.786 0.686 0.666

Robustness experiments—boundary jitter, erosion/dilation, reduced frames—show markedly less degradation for WoundNeRF than rasterization-based methods, attributed to implicit spatial regularization. Qualitatively, segmentation boundaries are smoother and more consistent; fewer spurious holes appear compared to 3D/2D mesh fusion, and label flickering seen in 2D methods is eliminated.

7. Model Properties, Limitations, and Future Developments

WoundNeRF's strengths include its true multi-view consistency (a single, continuous SDF + semantic field), implicit spatial regularization via volume rendering, and robustness to annotation noise. Every prediction is rooted in a continuous function, bypassing ad-hoc fusion heuristics and improving wound area measurability.

Limitations are present: model quality relies on the initial 2D automatic annotations—large errors in supervision induce bias; training requires several hours on a single GPU, though inference is rapid post-training; unobserved wound regions are "hallucinated" by the network. Potential avenues for enhancement include confidence-driven active sampling, view-planning extensions, and integrating shape or photometric priors for improved generalizability in sparse-view regimes.

A plausible implication is that SDF-based neural fields, with semantic decoding and volumetric consistency, will continue to supplant mesh-based and purely 2D segmentation pipelines for complex medical analysis, given their demonstrated superiority in robustness, accuracy, and spatial continuity (Chierchia et al., 23 Jan 2026).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to WoundNeRF.