Papers
Topics
Authors
Recent
Search
2000 character limit reached

Deep Local Shapes Reconstruction

Updated 29 December 2025
  • DeepLS is a deep shape representation method that partitions 3D scenes into local voxels, each encoded by an independent latent code with a shared MLP decoder.
  • The approach achieves high-fidelity surface reconstructions from partial data, outperforming global latent methods like DeepSDF in both efficiency and accuracy.
  • DeepLS enables rapid scene encoding and scalable optimization, with quantitative results showing significant improvements in metrics such as Chamfer Distance and RMSE.

Deep Local Shapes (DeepLS) is a deep shape representation approach for high-fidelity 3D surface reconstruction that encodes local signed distance functions (SDFs) in a memory-efficient manner, enabling detailed reconstructions of complex scenes and objects. Unlike methods such as DeepSDF, which rely on a single global latent code per object, DeepLS partitions the scene into local regions, each represented by an independent latent code, and employs a shared multilayer perceptron (MLP) decoder. This decomposition enables scalable and efficient learning of local SDF priors for dense 3D reconstruction from partial observations and limited training data (Chabra et al., 2020).

1. Local SDF Representation

DeepLS models the SDF of a scene as a set of locally defined, continuous SDFs, each parameterized by a local latent code. Formally, let fθ:R3×RdRf_\theta: \mathbb{R}^3 \times \mathbb{R}^d \to \mathbb{R} denote a shared MLP (decoder) with weights θ\theta. For a voxel (local region) ViV_i of side length \ell centered at ciR3c_i \in \mathbb{R}^3, the local latent code ziRdz_i \in \mathbb{R}^d encodes the shape of the surface inside ViV_i. Given a query point xR3x \in \mathbb{R}^3, DeepLS maps xx to the local coordinate frame:

Ti(x)=xci.T_i(x) = \frac{x - c_i}{\ell}.

The local SDF in voxel ii is computed as

ϕi(x)=fθ(Ti(x),zi).\phi_i(x) = f_\theta(T_i(x), z_i).

The global SDF field is assembled by aggregating the local SDFs for all voxels covering xx:

ϕ(x)={i:xVi}ϕi(x).\phi(x) = \sum_{\{i: x \in V_i\}} \phi_i(x).

The reconstructed surface SS is the zero level set:

S={xR3ϕ(x)=0}.S = \left\{ x \in \mathbb{R}^3 \mid \phi(x) = 0 \right\}.

2. Network Architecture

The DeepLS decoder fθf_\theta is a fully connected MLP with four layers, each hidden layer having 128 units and LeakyReLU activation functions. The input layer concatenates the 3D local point Ti(x)R3T_i(x)\in \mathbb{R}^3 and the local shape code ziR128z_i\in \mathbb{R}^{128}, resulting in 131 input dimensions. The final output passes through a tanh\tanh nonlinearity and is scaled to fit the SDF truncation range. In practice, the latent code is linearly embedded or concatenated at the network's first layer.

3. Scene Decomposition and Local Regions

DeepLS partitions the scene space into a regular, sparse grid of voxels (typically 5\ell\approx5–$8$ cm). Latent codes are only allocated to voxels near the observed surface, determined via depth map rasterization or occupancy grid techniques. Each code ziz_i is responsible for all sample points within an LL_\infty distance r=1.5r = 1.5\,\ell from cic_i, effectively extending the voxel’s receptive field to ensure border consistency between adjacent local SDFs. This design enables spatial overlap and local shape sharing, simplifying the learning task for the decoder.

4. Training Objective

Given a dataset of training pairs {(xj,sj)}\{(x_j, s_j)\}—with sjs_j the ground-truth signed distance at xjx_j—each xjx_j is associated with all receptive fields that contain it. For voxel ii, define Xi\mathcal{X}_i as the subset of points within its receptive field. The loss function for training is

L(θ,{zi})=ixjXifθ(Ti(xj),zi)sj+1σ2izi22,\mathcal{L}(\theta, \{z_i\}) = \sum_{i} \sum_{x_j \in \mathcal{X}_i} |f_\theta(T_i(x_j), z_i) - s_j| + \frac{1}{\sigma^2} \sum_i \| z_i \|_2^2,

an 1\ell_1 SDF regression term plus a Gaussian prior regularization (σ\sigma typically set to 0.01).

5. Inference and Scene Encoding

To encode new observations, DeepLS fixes the learned decoder weights θ\theta and optimizes only the local codes ziz_i:

z^i=argminzixjXifθ(Ti(xj),zi)sj+1σ2zi22.\hat{z}_i = \arg\min_{z_i} \sum_{x_j \in \mathcal{X}_i} |f_\theta(T_i(x_j), z_i) - s_j| + \frac{1}{\sigma^2} \|z_i\|_2^2.

The optimization for each ziz_i is independent and highly parallelizable. After convergence, the global SDF ϕ(x)\phi(x) is evaluated by summing the local decoders, and the surface is extracted using the Marching Cubes algorithm in a narrow band near observed points.

6. Quantitative Evaluation

DeepLS achieves significant improvements in both efficiency and reconstruction fidelity compared to alternative methods. The following summarizes key results:

Dataset / Task Metric / Value Reference
3D Warehouse (object-level) Chamfer Distance: DeepLS \approx 0.03, DeepSDF 0.20 (Chabra et al., 2020)
Stanford Bunny Efficiency Full detail in \sim1 min (RMSE 0.03%); DeepSDF \sim8 days for same accuracy (Chabra et al., 2020)
ICL-NUIM (synthetic scene) Asymmetric Chamfer: TSDF fusion \approx5.42 mm; DeepLS \approx4.92 mm (higher completeness at fixed accuracy) (Chabra et al., 2020)
3D Scene Dataset (real scans) Completion (error << 7 mm): TSDF (84–91%); DeepLS (88–99%); Error: TSDF (10–14 mm), DeepLS (6–10 mm) (Chabra et al., 2020)

DeepLS uses \sim0.05 million decoder parameters and \sim312,000 code dimensions for 3D Warehouse experiments. At inference, a shape can be encoded in approximately one minute for \sim10,000 local codes using parallel Adam optimization.

7. Implementation and Memory Aspects

Meshes are preprocessed by sampling points near the surface according to LL_\infty uniformity (DeepSDF convention). Point sets from depth scans are augmented by sampling along estimated normals (positive/negative SDF) and free-space points along camera rays, weighted by inverse depth. DeepLS fits comfortably on modern GPUs: for 50,000 voxels with 128D codes, memory usage is roughly 25 MB. Training for 1,000 shapes requires about 12 hours on a single GPU. At test-time, all local codes for an entire scene are typically optimized within one minute, leveraging parallel code inference. The use of an extended receptive field (r=1.5r = 1.5\,\ell) under LL_\infty ensures border consistency between overlapping voxels, eliminating the need for explicit blending mechanisms.

By balancing a small shared decoder with a large set of independent local latent codes, DeepLS exhibits high reconstruction fidelity, rapid scene encoding, and broad generalization, combining the advantages of DeepSDF’s learned priors with the scalability and efficiency benefits of sparse local representations (Chabra et al., 2020).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Deep Local Shapes (DeepLS).