Hash Encoding & Proposal Sampler Strategy

Updated 1 February 2026

The paper demonstrates that integrating multi-resolution hash encoding with a two-stage proposal sampler significantly reduces training time and memory usage.
Hash encoding replaces dense grids with a compact hash table, achieving up to 20× memory reduction and rapid feature lookup via trilinear interpolation.
The proposal sampler targets high-density regions along rays, halving fine network evaluations and enhancing robotic grasp performance.

Hash encoding and proposal sampler strategy refer to two algorithmic innovations in neural volumetric scene reconstruction, as implemented in the RGBGrasp framework for image-based robotic grasping using neural radiance fields (NeRF). Hash encoding is a multi-resolution spatial hashing mechanism for compact, trainable feature storage and efficient lookup, replacing traditional @@@@1@@@@. The proposal sampler is a two-stage ray sampling protocol designed to focus expensive evaluations of the rendering network onto regions of high density along a ray, reducing computation. Together, these strategies yield substantial improvements in reconstruction speed, memory footprint, and grasping performance from limited RGB views, as validated empirically in the RGBGrasp pipeline (Liu et al., 2023).

1. Multi-Resolution Hash Encoding: Architecture and Mechanism

RGBGrasp adopts the multi-resolution hash encoding framework of Müller et al. (Instant NGP). The method replaces classical Fourier-feature positional encoding $(x, y, z)$ in NeRF pipelines with a hierarchical, trainable hash-table structure. Specifically, $L$ independent grid levels are defined, each with side resolution $N_\ell$ (increasing geometrically, e.g., $N_\ell = \lfloor N_0 \cdot 2^{\ell / L} \rfloor$ for $\ell = 0, \ldots, L-1$ ). For each level, rather than allocating a dense voxel grid of size $N_\ell^3$ , a hash table of size $T$ (with $T \ll N_\ell^3$ ) is used, mapping 3D grid coordinates to indices via a "3-prime XOR" hash:

$H_\ell(p) = ((p_x \oplus p_y \cdot p_1 \oplus p_z \cdot p_2) \bmod T)$

where $p = (p_x, p_y, p_z) = \lfloor x \cdot s_\ell \rfloor$ (with scaling $s_\ell = N_\ell / \text{scene\_size}$ ) and $p_1$ , $p_2$ are fixed large primes. For any 3D point $x \in \mathbb{R}^3$ , the hash encoding retrieves $F$ -dimensional embeddings $\theta_\ell \in \mathbb{R}^{T \times F}$ at the eight corners $\delta \in \{0,1\}^3$ of the grid cell containing $x$ :

For each $\delta$ , set $i_\ell(\delta) = H_\ell(p + \delta)$ , $v_\ell(\delta) = \theta_\ell[i_\ell(\delta)]$ ,
Compute trilinear weight $w_\ell(\delta) = \prod_{d \in \{x, y, z\}} [\delta_d u_d + (1-\delta_d)(1-u_d)]$ , where $u = \text{frac}(x \cdot s_\ell)$ ,
Accumulate $f_\ell(x) = \sum_{\delta} w_\ell(\delta) v_\ell(\delta)$ ,
Concatenate over $L$ levels: $\text{enc}(x) = [ f_0(x); f_1(x); \ldots; f_{L-1}(x) ] \in \mathbb{R}^{L \cdot F}$ .

This encoding provides a high-resolution, memory-efficient feature representation, serving as input to the NeRF MLP for volumetric rendering.

2. Hash Encoding: Performance and Efficiency

Hash encoding yields distinctive advantages over dense grid and Fourier-based encodings:

Memory use is reduced from $O(F \cdot \sum_\ell N_\ell^3)$ (hundreds of MBs) for dense grids to $L \cdot T \cdot F$ floats (4 MB for typical $T \approx 2^{18}$ , $L=16$ , $F=2$ ).
Point lookups involve $8L$ table accesses with trilinear interpolation; these random-access loads are efficiently handled by modern GPUs, causing substantial speedup.
Empirical results in RGBGrasp demonstrate a $\sim 20\times$ reduction in memory and a $5$– $10\times$ acceleration in encoding-lookup, leading to sub-second NeRF training time for 12 RGB views.

3. Proposal Sampler Strategy: Two-Stage Ray Sampling

RGBGrasp integrates the two-stage proposal sampler design from Barron et al. (Mip-NeRF 360). The methodology uses a lightweight ProposalMLP to estimate rough density fields, allowing subsequent fine-grained samples to be concentrated in volumetric regions likely to contribute most to rendering. The protocol for each ray is as follows:

Stage 1: Uniformly sample $N_0$ depths $t_i^0$ ; evaluate ProposalMLP ( $\sim$ 0.7 $\mu$ s/query) for each sample to obtain $\sigma_i^0$ density; compute weights $w_i^0 \propto \alpha_i^0 \prod_{j < i} (1 - \alpha_j^0)$ with opacity $\alpha_i^0 = 1 - \exp(-\sigma_i^0 \delta_i^0)$ .
Form a discrete PDF for resampling: $p_i \propto w_i^0 + \epsilon$ , mix with uniform ( $\lambda \approx 0.1$ ) to retain coverage of low-density regions.
Stage 2: Resample $N_1$ fine depths $t_j^1$ from $p_i$ ; evaluate FineMLP (with hash encoding) at these locations for color and density outputs.

This approach allows halving the number of FineMLP evaluations, targeting high-density intervals.

4. Optimization and Training Protocol

During iterative NeRF training in RGBGrasp, three components are alternately optimized:

Hash-encoding tables $\theta_\ell$ ,
ProposalMLP (single hidden layer, 64 units, scalar output for density),
FineMLP (accepts hash encoding, outputs radiance and density).

Annealing strategies are used: after initial iterations (of 1200 total), $N_1$ is increased per iteration $t$ as $N_1(t) = N_1^{\min} + (N_1^{\max} - N_1^{\min}) (t/T)^\gamma$ with $\gamma \approx 0.5$ to concentrate and subsequently refine sample allocations.

5. Quantitative Ablation: Timing, Memory, and Accuracy

Comprehensive ablation demonstrates the impact on training time, memory, and accuracy. RGBGrasp was trained on a NVIDIA 3090 with 12 images and 8192 rays/step for 1200 steps, comparing:

A: Full RGBGrasp (Hash + Proposal),
B: Hash only (single-stage, $N=64$ ),
C: Dense grid (no hash, no proposal, $N=64$ ).

Variant	Train Time (min)	GPU Mem (GB)	RMSE (L2 u.)	Samples
A: Hash+Prop	1.1	4.0	0.023	32+32
B: Hash only	1.6	4.5	0.024	64
C: Dense	5.2	15.8	0.025	64

Hash encoding alone confers $\sim 3\times$ reductions in both training time and memory vs dense grid; the proposal sampler further reduces training time by $\sim 30\%$ at negligible RMSE change.

6. Downstream Grasp Performance and Qualitative Outcomes

RGBGrasp ablation on 200 simulated cluttered scenes (mixed materials) shows direct improvements in robotic grasp metrics:

Grasp Success Rate (SR) and Declutter Rate (DR):

Variant	SR (%)	DR (%)	Time (min)	RMSE
A: Hash+Prop	84.5	79.0	1.1	0.023
B: Hash only	82.0	76.8	1.6	0.024
C: Dense	79.3	73.5	5.2	0.025

Both hash encoding and proposal sampling individually improve grasp success relative to the baseline. Qualitatively, reconstructions produced with these strategies display sharper object edges and reduced floating-density artifacts.

7. Context, Significance, and Integration

The fusion of multi-resolution hash tables, as per Instant NGP, with proposal-based two-stage sampling, as per Mip-NeRF 360, allows RGBGrasp to achieve order-of-magnitude reductions in memory and runtime for neural 3D reconstruction from limited RGB views. This supports robust 6-DoF grasp planning in complex, cluttered scenes, including transparent and specular objects, and yields both photometric and geometric fidelity. A plausible implication is that such architectural advances make volumetric learning tractable for real-time manipulation applications where sensor and computational resources are constrained (Liu et al., 2023).

Markdown Report Issue Upgrade to Chat

References (1)

RGBGrasp: Image-based Object Grasping by Capturing Multiple Views during Robot Arm Movement with Neural Radiance Fields (2023)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Hash Encoding and Proposal Sampler Strategy.

Hash Encoding & Proposal Sampler Strategy

1. Multi-Resolution Hash Encoding: Architecture and Mechanism

2. Hash Encoding: Performance and Efficiency

3. Proposal Sampler Strategy: Two-Stage Ray Sampling

4. Optimization and Training Protocol

5. Quantitative Ablation: Timing, Memory, and Accuracy

6. Downstream Grasp Performance and Qualitative Outcomes

7. Context, Significance, and Integration

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Don't miss out on important new AI/ML research

Hash Encoding & Proposal Sampler Strategy

1. Multi-Resolution Hash Encoding: Architecture and Mechanism

2. Hash Encoding: Performance and Efficiency

3. Proposal Sampler Strategy: Two-Stage Ray Sampling

4. Optimization and Training Protocol

5. Quantitative Ablation: Timing, Memory, and Accuracy

6. Downstream Grasp Performance and Qualitative Outcomes

7. Context, Significance, and Integration

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Related Topics

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research