NeAS: 3D Reconstruction from X-ray Images using Neural Attenuation Surface

Published 10 Mar 2025 in eess.IV and cs.CV | (2503.07491v1)

Abstract: Reconstructing three-dimensional (3D) structures from two-dimensional (2D) X-ray images is a valuable and efficient technique in medical applications that requires less radiation exposure than computed tomography scans. Recent approaches that use implicit neural representations have enabled the synthesis of novel views from sparse X-ray images. However, although image synthesis has improved the accuracy, the accuracy of surface shape estimation remains insufficient. Therefore, we propose a novel approach for reconstructing 3D scenes using a Neural Attenuation Surface (NeAS) that simultaneously captures the surface geometry and attenuation coefficient fields. NeAS incorporates a signed distance function (SDF), which defines the attenuation field and aids in extracting the 3D surface within the scene. We conducted experiments using simulated and authentic X-ray images, and the results demonstrated that NeAS could accurately extract 3D surfaces within a scene using only 2D X-ray images.

Abstract PDF Upgrade to Chat

Summary

The paper introduces NeAS, a neural method that integrates SDF and attenuation fields to reconstruct 3D surfaces from sparse X-ray data.
The methodology leverages dual MLPs, a novel Surface Boundary Function, and multi-resolution encoding to enhance surface detail and rendering quality.
Experiments demonstrate state-of-the-art view synthesis and surface reconstruction metrics with effective pose refinement and reduced radiation exposure in medical imaging.

This paper introduces NeAS (Neural Attenuation Surface), a novel implicit neural representation method for reconstructing 3D structures from sparse 2D X-ray images. The primary motivation is to overcome the limitations of existing implicit neural representation (INR) methods, such as Neural Attenuation Fields (NAF), which excel at novel view synthesis from sparse X-ray data but struggle to accurately reconstruct the underlying 3D surface geometry. Accurate 3D reconstruction from sparse X-rays is valuable in medical applications as it requires less radiation exposure compared to traditional high-resolution Computed Tomography (CT) scans.

NeAS addresses the surface reconstruction challenge by integrating a Signed Distance Function (SDF), commonly used in surface reconstruction from visible light images (like in NeuS), with the Neural Attenuation Field (NAF). The core idea is that the SDF explicitly defines the boundary of the 3D object, and this boundary information is used to constrain the attenuation field. This allows NeAS to simultaneously learn both the attenuation properties within the object and the precise location of its surface.

The NeAS framework uses two multi-layer perceptrons (MLPs): $\Theta_{\mathrm{sdf}}$ for predicting the signed distance to the surface and $\Theta_{\mathrm{att}}$ for predicting an attenuation parameter. Given a 3D point $\mathbf{x}$ along a ray, $\Theta_{\mathrm{sdf}}$ outputs an intermediate feature vector $\mathbf{f}$ and the signed distance $d$ . The feature vector $\mathbf{f}$ is then fed into $\Theta_{\mathrm{att}}$ , which outputs a raw attenuation parameter $\bar{\mu}$ . A key component is the Surface Boundary Function (SBF), $\Omega(d,s) = \frac{exp(-sd)}{1+exp(-sd)}$ , where $s$ is a learnable parameter controlling steepness. This function, similar to a sigmoid, approximates a step function that is 1 inside the surface ( $d<0$ ) and 0 outside ( $d>0$ ). The final attenuation coefficient $\mu(\mathbf{x})$ at point $\mathbf{x}$ is calculated as $\mu(\mathbf{x}) = \Omega(d,s)\bar{\mu}$ . The output layer of $\Theta_{\mathrm{att}}$ uses an activation function $\alpha\sigma(x)+\beta$ with $\alpha, \beta > 0$ to ensure $\bar{\mu}$ is positive, allowing the SBF to correctly gate attenuation based on the signed distance.

For scene representation, NeAS employs positional encoding. The paper explores both frequency encoding (as in standard NeRF) and multiresolution hash encoding (similar to Instant-NGP). Hash encoding offers faster training and inference speeds while demonstrating a better capability to capture high-frequency details of the surface geometry, which is reflected in improved Chamfer distances compared to frequency encoding, particularly for complex shapes like a skull. However, frequency encoding sometimes yields smoother surfaces and better 2D synthesis results, especially with real-world data where noise can affect hash encoding.

Volume rendering in NeAS is based on Lambert-Beer's law, which models X-ray attenuation. The pixel intensity $\hat{I}(\mathbf{r})$ for a ray $\mathbf{r}$ is computed by approximating the integral of the attenuation coefficient along the ray using a quadrature rule: $\hat{I}(\mathbf{r}) = exp(-\sum_{j=1}^{N}\mu(\mathbf{x}_j)\delta_j)$ , where $\mathbf{x}_j$ are sampled points, $\mu(\mathbf{x}_j)$ are the attenuation coefficients, and $\delta_j$ are segment lengths. The model is trained by minimizing the Mean Squared Error between the rendered image intensity and the ground truth intensity ( $\mathcal{L}_{\mathrm{int}}$ ), combined with an Eikonal regularization term on the SDF gradient ( $\mathcal{L}_{\mathrm{reg}}$ ) to ensure the SDF represents a valid distance function.

A significant practical challenge in real-world X-ray imaging is the potential inaccuracy of camera (X-ray source and detector) poses. NeAS incorporates a pose refinement mechanism to optimize the extrinsic (translation) and intrinsic (principal point) parameters during training. To stabilize this optimization, especially with sparse views, a coarse-to-fine strategy is used, incorporating frequency regularization with a weight mask that emphasizes lower frequencies early in training and gradually includes higher frequencies. This prevents early overfitting to potentially noisy high-frequency details influenced by pose errors.

For reconstructing scenes with multiple materials having distinct attenuation properties (e.g., bone and muscle within tissue), NeAS is extended to 2M-NeAS. This architecture uses a single $\Theta_{\mathrm{sdf}}$ to predict multiple signed distances ( $d_1, d_2, \dots$ ) corresponding to different material boundaries. Separate $\Theta_{\mathrm{att}}$ MLPs ( $\Theta_{\mathrm{att1}}, \Theta_{\mathrm{att2}}, \dots$ ) are trained for the attenuation fields of each material. The final attenuation coefficient at a point is determined by a selection function $\Lambda(d_2, \mu_1, \mu_2)$ based on which side of the internal surface (defined by $d_2$ ) the point lies. For instance, for a bone-muscle boundary, if $d_2 < 0$ (inside bone), $\mu_2$ is selected; otherwise, $\mu_1$ (muscle attenuation bounded by the outer surface) is used. Critically, the $\alpha, \beta$ parameters for each material's $\Theta_{\mathrm{att}}$ must be manually set based on known or estimated attenuation ranges of the materials to prevent overlap and ensure distinct representation by the SDFs.

Experiments on simulated and real X-ray data of phantoms (knee bone, skull, leg, head) demonstrate NeAS's effectiveness. Quantitatively, NeAS achieves state-of-the-art performance in both novel view synthesis (higher PSNR/SSIM, lower LPIPS) and surface reconstruction (lower Chamfer Distance) compared to methods like NAF, NeAT, SAX-NeRF, and R $^2$ -Gaussian. Pose refinement is shown to be crucial for accurate reconstruction from real data with imperfect calibration. While hash encoding is faster and captures more detail, frequency encoding is more robust to noise in real data, yielding smoother surfaces. 2M-NeAS successfully reconstructs both outer (skin-air) and inner (bone-muscle) surfaces.

Implementation considerations include selecting the appropriate encoding (hash for speed and detail if data is clean, frequency for robustness and smoothness), tuning hyperparameters like the Eikonal loss weight $\lambda$ and SBF steepness $s$ , and manually setting $\alpha$ and $\beta$ for attenuation ranges, especially in the multi-material case. The authors suggest potential methods for estimating $\alpha, \beta$ by analyzing attenuation distributions or using known material properties. The experiments were performed on a single NVIDIA RTX3090 GPU, indicating feasible computational requirements for training.

The main practical limitation is the manual determination of the attenuation range parameters $\alpha$ and $\beta$ , which requires prior knowledge or a preliminary estimation step. Future work could explore adaptive methods for learning these parameters and extending the framework to handle more than two materials.