Papers
Topics
Authors
Recent
Search
2000 character limit reached

Hash Encoding & Proposal Sampler Strategy

Updated 1 February 2026
  • The paper demonstrates that integrating multi-resolution hash encoding with a two-stage proposal sampler significantly reduces training time and memory usage.
  • Hash encoding replaces dense grids with a compact hash table, achieving up to 20× memory reduction and rapid feature lookup via trilinear interpolation.
  • The proposal sampler targets high-density regions along rays, halving fine network evaluations and enhancing robotic grasp performance.

Hash encoding and proposal sampler strategy refer to two algorithmic innovations in neural volumetric scene reconstruction, as implemented in the RGBGrasp framework for image-based robotic grasping using neural radiance fields (NeRF). Hash encoding is a multi-resolution spatial hashing mechanism for compact, trainable feature storage and efficient lookup, replacing traditional @@@@1@@@@. The proposal sampler is a two-stage ray sampling protocol designed to focus expensive evaluations of the rendering network onto regions of high density along a ray, reducing computation. Together, these strategies yield substantial improvements in reconstruction speed, memory footprint, and grasping performance from limited RGB views, as validated empirically in the RGBGrasp pipeline (Liu et al., 2023).

1. Multi-Resolution Hash Encoding: Architecture and Mechanism

RGBGrasp adopts the multi-resolution hash encoding framework of Müller et al. (Instant NGP). The method replaces classical Fourier-feature positional encoding (x,y,z)(x, y, z) in NeRF pipelines with a hierarchical, trainable hash-table structure. Specifically, LL independent grid levels are defined, each with side resolution NN_\ell (increasing geometrically, e.g., N=N02/LN_\ell = \lfloor N_0 \cdot 2^{\ell / L} \rfloor for =0,,L1\ell = 0, \ldots, L-1). For each level, rather than allocating a dense voxel grid of size N3N_\ell^3, a hash table of size TT (with TN3T \ll N_\ell^3) is used, mapping 3D grid coordinates to indices via a "3-prime XOR" hash:

H(p)=((pxpyp1pzp2)modT)H_\ell(p) = ((p_x \oplus p_y \cdot p_1 \oplus p_z \cdot p_2) \bmod T)

where p=(px,py,pz)=xsp = (p_x, p_y, p_z) = \lfloor x \cdot s_\ell \rfloor (with scaling s=N/scene_sizes_\ell = N_\ell / \text{scene\_size}) and p1p_1, p2p_2 are fixed large primes. For any 3D point xR3x \in \mathbb{R}^3, the hash encoding retrieves FF-dimensional embeddings θRT×F\theta_\ell \in \mathbb{R}^{T \times F} at the eight corners δ{0,1}3\delta \in \{0,1\}^3 of the grid cell containing xx:

  • For each δ\delta, set i(δ)=H(p+δ)i_\ell(\delta) = H_\ell(p + \delta), v(δ)=θ[i(δ)]v_\ell(\delta) = \theta_\ell[i_\ell(\delta)],
  • Compute trilinear weight w(δ)=d{x,y,z}[δdud+(1δd)(1ud)]w_\ell(\delta) = \prod_{d \in \{x, y, z\}} [\delta_d u_d + (1-\delta_d)(1-u_d)], where u=frac(xs)u = \text{frac}(x \cdot s_\ell),
  • Accumulate f(x)=δw(δ)v(δ)f_\ell(x) = \sum_{\delta} w_\ell(\delta) v_\ell(\delta),
  • Concatenate over LL levels: enc(x)=[f0(x);f1(x);;fL1(x)]RLF\text{enc}(x) = [ f_0(x); f_1(x); \ldots; f_{L-1}(x) ] \in \mathbb{R}^{L \cdot F}.

This encoding provides a high-resolution, memory-efficient feature representation, serving as input to the NeRF MLP for volumetric rendering.

2. Hash Encoding: Performance and Efficiency

Hash encoding yields distinctive advantages over dense grid and Fourier-based encodings:

  • Memory use is reduced from O(FN3)O(F \cdot \sum_\ell N_\ell^3) (hundreds of MBs) for dense grids to LTFL \cdot T \cdot F floats (4 MB for typical T218T \approx 2^{18}, L=16L=16, F=2F=2).
  • Point lookups involve $8L$ table accesses with trilinear interpolation; these random-access loads are efficiently handled by modern GPUs, causing substantial speedup.
  • Empirical results in RGBGrasp demonstrate a 20×\sim 20\times reduction in memory and a $5$–10×10\times acceleration in encoding-lookup, leading to sub-second NeRF training time for 12 RGB views.

3. Proposal Sampler Strategy: Two-Stage Ray Sampling

RGBGrasp integrates the two-stage proposal sampler design from Barron et al. (Mip-NeRF 360). The methodology uses a lightweight ProposalMLP to estimate rough density fields, allowing subsequent fine-grained samples to be concentrated in volumetric regions likely to contribute most to rendering. The protocol for each ray is as follows:

  • Stage 1: Uniformly sample N0N_0 depths ti0t_i^0; evaluate ProposalMLP (\sim0.7μ\mus/query) for each sample to obtain σi0\sigma_i^0 density; compute weights wi0αi0j<i(1αj0)w_i^0 \propto \alpha_i^0 \prod_{j < i} (1 - \alpha_j^0) with opacity αi0=1exp(σi0δi0)\alpha_i^0 = 1 - \exp(-\sigma_i^0 \delta_i^0).
  • Form a discrete PDF for resampling: piwi0+ϵp_i \propto w_i^0 + \epsilon, mix with uniform (λ0.1\lambda \approx 0.1) to retain coverage of low-density regions.
  • Stage 2: Resample N1N_1 fine depths tj1t_j^1 from pip_i; evaluate FineMLP (with hash encoding) at these locations for color and density outputs.

This approach allows halving the number of FineMLP evaluations, targeting high-density intervals.

4. Optimization and Training Protocol

During iterative NeRF training in RGBGrasp, three components are alternately optimized:

  • Hash-encoding tables θ\theta_\ell,
  • ProposalMLP (single hidden layer, 64 units, scalar output for density),
  • FineMLP (accepts hash encoding, outputs radiance and density).

Annealing strategies are used: after initial iterations (of 1200 total), N1N_1 is increased per iteration tt as N1(t)=N1min+(N1maxN1min)(t/T)γN_1(t) = N_1^{\min} + (N_1^{\max} - N_1^{\min}) (t/T)^\gamma with γ0.5\gamma \approx 0.5 to concentrate and subsequently refine sample allocations.

5. Quantitative Ablation: Timing, Memory, and Accuracy

Comprehensive ablation demonstrates the impact on training time, memory, and accuracy. RGBGrasp was trained on a NVIDIA 3090 with 12 images and 8192 rays/step for 1200 steps, comparing:

  • A: Full RGBGrasp (Hash + Proposal),
  • B: Hash only (single-stage, N=64N=64),
  • C: Dense grid (no hash, no proposal, N=64N=64).
Variant Train Time (min) GPU Mem (GB) RMSE (L2 u.) Samples
A: Hash+Prop 1.1 4.0 0.023 32+32
B: Hash only 1.6 4.5 0.024 64
C: Dense 5.2 15.8 0.025 64

Hash encoding alone confers 3×\sim 3\times reductions in both training time and memory vs dense grid; the proposal sampler further reduces training time by 30%\sim 30\% at negligible RMSE change.

6. Downstream Grasp Performance and Qualitative Outcomes

RGBGrasp ablation on 200 simulated cluttered scenes (mixed materials) shows direct improvements in robotic grasp metrics:

  • Grasp Success Rate (SR) and Declutter Rate (DR):
Variant SR (%) DR (%) Time (min) RMSE
A: Hash+Prop 84.5 79.0 1.1 0.023
B: Hash only 82.0 76.8 1.6 0.024
C: Dense 79.3 73.5 5.2 0.025

Both hash encoding and proposal sampling individually improve grasp success relative to the baseline. Qualitatively, reconstructions produced with these strategies display sharper object edges and reduced floating-density artifacts.

7. Context, Significance, and Integration

The fusion of multi-resolution hash tables, as per Instant NGP, with proposal-based two-stage sampling, as per Mip-NeRF 360, allows RGBGrasp to achieve order-of-magnitude reductions in memory and runtime for neural 3D reconstruction from limited RGB views. This supports robust 6-DoF grasp planning in complex, cluttered scenes, including transparent and specular objects, and yields both photometric and geometric fidelity. A plausible implication is that such architectural advances make volumetric learning tractable for real-time manipulation applications where sensor and computational resources are constrained (Liu et al., 2023).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Hash Encoding and Proposal Sampler Strategy.