MS-Mip-NeRF 360: Multi-Space Rendering

Updated 20 January 2026

The paper introduces a novel multi-space feature field decomposition technique that enhances view consistency for scenes with mirror and transparent surfaces.
MS-Mip-NeRF 360 integrates into Mip-NeRF 360 by modifying only the output head, preserving cone-based encoding and hierarchical sampling while incorporating gated sub-space blending.
Empirical results show significant improvements in PSNR, SSIM, and LPIPS with minimal computational overhead, demonstrating its effectiveness for complex light transport rendering.

MS-Mip-NeRF 360 is an integration of the Multi-Space Neural Radiance Field (MS-NeRF) framework into Mip-NeRF 360, targeting robust photorealistic 360° novel-view synthesis in environments containing reflective and refractive objects. Traditional NeRF and Mip-NeRF 360 architectures are limited by the assumption of a single continuous Euclidean radiance field, leading to failures in multi-view consistency when mirror-like or transparent surfaces generate virtual images. MS-Mip-NeRF 360 remedies this by decomposing the scene into multiple parallel sub-space feature fields, each remaining individually multi-view consistent and capturing direct, reflected, and refracted image content. The integration yields significant quantitative improvements with minimal architectural or computational overhead, extending Mip-NeRF 360’s capabilities into general vision with complex light transport phenomena (Yin et al., 2023, Barron et al., 2021).

1. Multi-Space Feature Field Decomposition

The MS-NeRF multi-space scheme formulates the scene as $K$ parallel radiance or feature fields ("sub-spaces"), with each sub-space responsible for encoding a physically plausible view-consistent region of the scene, such as direct geometry or a reflection/refraction behind a mirror or glass. For each cast camera ray, a learned gating network selects and blends colors from these sub-spaces based on the viewpoint and path through the environment, assigning appropriate contributions per sub-space. Unlike conventional methods which explain reflections as conflicting textures within one field, this decomposition natively models virtual images, permitting faithful rendering of complex light paths in novel views.

2. Architectural Integration with Mip-NeRF 360

MS-Mip-NeRF 360 modifies only the output head of the original architecture, maintaining core mechanisms from Mip-NeRF 360 such as cone-based positional encoding, hierarchical sampling, and unbounded scene contraction. At each sampled point along a rendering ray, the backbone MLP emits $K$ scalar densities $\{\sigma^k_i\}_{k=1}^K$ and $K$ feature vectors $\{\mathbf f^k_i\}_{k=1}^K$ . These are accumulated (per-subspace) via a volumetric integration scheme consistent with the original network but applied in parallel for each sub-space. A pair of small decoder MLPs transforms the aggregated features into per-subspace RGB values and gating weights, which are then combined using a softmax blend across sub-spaces for the final pixel color. Optionally, grid-based multi-space variants substitute the MLP-based output heads with $K$ small feature grids, each grid modeling a local feature/density for its sub-space, trading off inference speed for memory efficiency.

3. Mathematical Framework and Rendering Pipeline

Let a camera ray $\mathbf r(t) = \mathbf o + t \mathbf d$ be discretized into samples $t_1 < \ldots < t_N$ calculated via cone-frustum-aware sampling. For each sample, the network outputs $\{\sigma^k_i, \mathbf f^k_i\}_{k=1}^K$ . The integration for each sub-space $k$ uses:

$K$ 0

$K$ 1

Decoded values are obtained as:

$K$ 2

Final color for the ray:

$K$ 3

This formulation ensures efficient inference, as densities $K$ 4 modulate integration per sub-space rather than invoking redundant volumetric renderings.

4. Training Protocol and Loss Functions

MS-Mip-NeRF 360 employs the standard mean-squared error between predicted and ground-truth pixel colors over all sampled rays:

$K$ 5

No additional loss terms or reflection masks are used. Training hyperparameters mirror those of Mip-NeRF 360: 200,000 iterations, batch size of 1024 rays, coarse-to-fine hierarchical sampling, and identical learning rate schedules. The approach retains compatibility with Mip-NeRF 360’s contraction, anti-aliasing, and regularization components (Barron et al., 2021).

5. Empirical Performance and Quantitative Analysis

On a synthetic benchmark consisting of 25 test scenes, MS-Mip-NeRF 360 achieves PSNR 35.04 dB, SSIM 0.906, and LPIPS 0.130, representing improvements of +3.46 dB, +0.011 SSIM, and –0.015 LPIPS over the baseline (Mip-NeRF 360: PSNR 31.58, SSIM 0.895, LPIPS 0.145). Model size increases marginally from 9.007M to 9.052M parameters (+0.5%). On real-capture scenes, the MS integration yields PSNR gains of +1.44 and SSIM/LPIPS stability. Inference and memory overheads remain negligible (<5% slower).

Ablations demonstrate that naïve multi-space averaging ("MS-Avg") yields only marginal improvement (+0.6 dB) and introduces blur, whereas the MS-NeRF feature plus gated blending delivers +1.6 dB over NeRF. Sub-space count $K$ 6 saturates performance at $K$ 7, and increasing feature dimension $K$ 8 beyond 48 shows diminishing gains.

6. Comparisons with Baseline and State-of-the-Art

MS-Mip-NeRF 360 substantially outperforms standard NeRF, Mip-NeRF, and Mip-NeRF 360 in rendering mirror and refractive regions, avoiding blur and missing images. The model also surpasses Ref-NeRF—which targets glossy BRDFs—by more than +1 dB PSNR on specular regions, since Ref-NeRF handles reflections as view-dependent textures in a unified field and fails at mirror-perfect surfaces. On the RFFR reflection dataset, MS-NeRF_T attains PSNR 35.93 dB compared to NeRFReN’s 35.26 dB, even without reflection masks. Qualitatively, MS-Mip-NeRF 360 produces crisp, correctly-located mirror images, whereas baselines smear or omit such features.

7. Dataset, Evaluation, and Insights

MS-Mip-NeRF 360 is evaluated on a curated dataset of 33 scenes, including 25 synthetic and 7 real captures with complex reflective and refractive properties. Synthetic scenes contain 120 images captured on a 360° circle and split 100/10/10 for train/val/test. Real captures employ LLFF pose conventions and are split with ∼20% for test and validation. Metrics include PSNR, SSIM, and LPIPS (VGG) computed on test images unseen during training.

The key insight is that decomposing the radiance field into multiple sub-spaces resolves the inherent training conflicts in vanilla NeRF backbones when modeling both physical and virtual image content. Sub-space gating enables dynamic per-ray selection of the relevant "camera path." Only the output head is replaced, requiring minor computational resources: two 1-hidden-layer MLPs and $K$ 9-way features, leaving all cone-based sampling and scene contraction intact. This design generalizes Mip-NeRF 360 into an all-purpose 360° renderer for scenes with perfect reflections and refractions, requiring no manual annotation, mask, or geometric prior (Yin et al., 2023).

Markdown Report Issue Upgrade to Chat

References (2)

MS-NeRF: Multi-Space Neural Radiance Fields (2023)

Mip-NeRF 360: Unbounded Anti-Aliased Neural Radiance Fields (2021)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to MS-Mip-NeRF 360.