Papers
Topics
Authors
Recent
Search
2000 character limit reached

NeAS: 3D Reconstruction from X-ray Images using Neural Attenuation Surface

Published 10 Mar 2025 in eess.IV and cs.CV | (2503.07491v1)

Abstract: Reconstructing three-dimensional (3D) structures from two-dimensional (2D) X-ray images is a valuable and efficient technique in medical applications that requires less radiation exposure than computed tomography scans. Recent approaches that use implicit neural representations have enabled the synthesis of novel views from sparse X-ray images. However, although image synthesis has improved the accuracy, the accuracy of surface shape estimation remains insufficient. Therefore, we propose a novel approach for reconstructing 3D scenes using a Neural Attenuation Surface (NeAS) that simultaneously captures the surface geometry and attenuation coefficient fields. NeAS incorporates a signed distance function (SDF), which defines the attenuation field and aids in extracting the 3D surface within the scene. We conducted experiments using simulated and authentic X-ray images, and the results demonstrated that NeAS could accurately extract 3D surfaces within a scene using only 2D X-ray images.

Summary

  • The paper introduces NeAS, a neural method that integrates SDF and attenuation fields to reconstruct 3D surfaces from sparse X-ray data.
  • The methodology leverages dual MLPs, a novel Surface Boundary Function, and multi-resolution encoding to enhance surface detail and rendering quality.
  • Experiments demonstrate state-of-the-art view synthesis and surface reconstruction metrics with effective pose refinement and reduced radiation exposure in medical imaging.

This paper introduces NeAS (Neural Attenuation Surface), a novel implicit neural representation method for reconstructing 3D structures from sparse 2D X-ray images. The primary motivation is to overcome the limitations of existing implicit neural representation (INR) methods, such as Neural Attenuation Fields (NAF), which excel at novel view synthesis from sparse X-ray data but struggle to accurately reconstruct the underlying 3D surface geometry. Accurate 3D reconstruction from sparse X-rays is valuable in medical applications as it requires less radiation exposure compared to traditional high-resolution Computed Tomography (CT) scans.

NeAS addresses the surface reconstruction challenge by integrating a Signed Distance Function (SDF), commonly used in surface reconstruction from visible light images (like in NeuS), with the Neural Attenuation Field (NAF). The core idea is that the SDF explicitly defines the boundary of the 3D object, and this boundary information is used to constrain the attenuation field. This allows NeAS to simultaneously learn both the attenuation properties within the object and the precise location of its surface.

The NeAS framework uses two multi-layer perceptrons (MLPs): Θsdf\Theta_{\mathrm{sdf}} for predicting the signed distance to the surface and Θatt\Theta_{\mathrm{att}} for predicting an attenuation parameter. Given a 3D point x\mathbf{x} along a ray, Θsdf\Theta_{\mathrm{sdf}} outputs an intermediate feature vector f\mathbf{f} and the signed distance dd. The feature vector f\mathbf{f} is then fed into Θatt\Theta_{\mathrm{att}}, which outputs a raw attenuation parameter μˉ\bar{\mu}. A key component is the Surface Boundary Function (SBF), Ω(d,s)=exp(−sd)1+exp(−sd)\Omega(d,s) = \frac{exp(-sd)}{1+exp(-sd)}, where ss is a learnable parameter controlling steepness. This function, similar to a sigmoid, approximates a step function that is 1 inside the surface (d<0d<0) and 0 outside (d>0d>0). The final attenuation coefficient μ(x)\mu(\mathbf{x}) at point x\mathbf{x} is calculated as μ(x)=Ω(d,s)μˉ\mu(\mathbf{x}) = \Omega(d,s)\bar{\mu}. The output layer of Θatt\Theta_{\mathrm{att}} uses an activation function ασ(x)+β\alpha\sigma(x)+\beta with α,β>0\alpha, \beta > 0 to ensure μˉ\bar{\mu} is positive, allowing the SBF to correctly gate attenuation based on the signed distance.

For scene representation, NeAS employs positional encoding. The paper explores both frequency encoding (as in standard NeRF) and multiresolution hash encoding (similar to Instant-NGP). Hash encoding offers faster training and inference speeds while demonstrating a better capability to capture high-frequency details of the surface geometry, which is reflected in improved Chamfer distances compared to frequency encoding, particularly for complex shapes like a skull. However, frequency encoding sometimes yields smoother surfaces and better 2D synthesis results, especially with real-world data where noise can affect hash encoding.

Volume rendering in NeAS is based on Lambert-Beer's law, which models X-ray attenuation. The pixel intensity I^(r)\hat{I}(\mathbf{r}) for a ray r\mathbf{r} is computed by approximating the integral of the attenuation coefficient along the ray using a quadrature rule: I^(r)=exp(−∑j=1Nμ(xj)δj)\hat{I}(\mathbf{r}) = exp(-\sum_{j=1}^{N}\mu(\mathbf{x}_j)\delta_j), where xj\mathbf{x}_j are sampled points, μ(xj)\mu(\mathbf{x}_j) are the attenuation coefficients, and δj\delta_j are segment lengths. The model is trained by minimizing the Mean Squared Error between the rendered image intensity and the ground truth intensity (Lint\mathcal{L}_{\mathrm{int}}), combined with an Eikonal regularization term on the SDF gradient (Lreg\mathcal{L}_{\mathrm{reg}}) to ensure the SDF represents a valid distance function.

A significant practical challenge in real-world X-ray imaging is the potential inaccuracy of camera (X-ray source and detector) poses. NeAS incorporates a pose refinement mechanism to optimize the extrinsic (translation) and intrinsic (principal point) parameters during training. To stabilize this optimization, especially with sparse views, a coarse-to-fine strategy is used, incorporating frequency regularization with a weight mask that emphasizes lower frequencies early in training and gradually includes higher frequencies. This prevents early overfitting to potentially noisy high-frequency details influenced by pose errors.

For reconstructing scenes with multiple materials having distinct attenuation properties (e.g., bone and muscle within tissue), NeAS is extended to 2M-NeAS. This architecture uses a single Θsdf\Theta_{\mathrm{sdf}} to predict multiple signed distances (d1,d2,…d_1, d_2, \dots) corresponding to different material boundaries. Separate Θatt\Theta_{\mathrm{att}} MLPs (Θatt1,Θatt2,…\Theta_{\mathrm{att1}}, \Theta_{\mathrm{att2}}, \dots) are trained for the attenuation fields of each material. The final attenuation coefficient at a point is determined by a selection function Λ(d2,μ1,μ2)\Lambda(d_2, \mu_1, \mu_2) based on which side of the internal surface (defined by d2d_2) the point lies. For instance, for a bone-muscle boundary, if d2<0d_2 < 0 (inside bone), μ2\mu_2 is selected; otherwise, μ1\mu_1 (muscle attenuation bounded by the outer surface) is used. Critically, the α,β\alpha, \beta parameters for each material's Θatt\Theta_{\mathrm{att}} must be manually set based on known or estimated attenuation ranges of the materials to prevent overlap and ensure distinct representation by the SDFs.

Experiments on simulated and real X-ray data of phantoms (knee bone, skull, leg, head) demonstrate NeAS's effectiveness. Quantitatively, NeAS achieves state-of-the-art performance in both novel view synthesis (higher PSNR/SSIM, lower LPIPS) and surface reconstruction (lower Chamfer Distance) compared to methods like NAF, NeAT, SAX-NeRF, and R2^2-Gaussian. Pose refinement is shown to be crucial for accurate reconstruction from real data with imperfect calibration. While hash encoding is faster and captures more detail, frequency encoding is more robust to noise in real data, yielding smoother surfaces. 2M-NeAS successfully reconstructs both outer (skin-air) and inner (bone-muscle) surfaces.

Implementation considerations include selecting the appropriate encoding (hash for speed and detail if data is clean, frequency for robustness and smoothness), tuning hyperparameters like the Eikonal loss weight λ\lambda and SBF steepness ss, and manually setting α\alpha and β\beta for attenuation ranges, especially in the multi-material case. The authors suggest potential methods for estimating α,β\alpha, \beta by analyzing attenuation distributions or using known material properties. The experiments were performed on a single NVIDIA RTX3090 GPU, indicating feasible computational requirements for training.

The main practical limitation is the manual determination of the attenuation range parameters α\alpha and β\beta, which requires prior knowledge or a preliminary estimation step. Future work could explore adaptive methods for learning these parameters and extending the framework to handle more than two materials.

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 2 tweets with 2 likes about this paper.