SpectralNeRF: Neural Spectral Rendering

Updated 29 January 2026

SpectralNeRF is an end-to-end neural rendering architecture that decomposes scene radiance into discrete spectral bands to capture complex optical phenomena.
It employs a SpectralMLP for per-band radiance estimation and a Spectrum Attention UNet for fusing these maps into a high-quality RGB image.
Experimental results show improvements of 1–3 dB in PSNR and enhanced SSIM and LPIPS scores, demonstrating its superior physical realism over conventional NeRFs.

SpectralNeRF is an end-to-end neural rendering architecture that integrates physically based spectral rendering into the Neural Radiance Field (NeRF) paradigm. The method is designed to overcome the limitations of three-channel RGB NeRFs by decomposing volumetric scene radiance into discrete spectral bands, enabling physically grounded rendering and improved performance in scenarios governed by complex spectral phenomena such as dispersion, colored shadows, and non-white illumination (Li et al., 2023).

1. Motivation and Limitations of Conventional NeRFs

Classical NeRF models each 3D point and viewing direction with an RGB vector and a scalar density, implicitly folding all wavelength-dependent effects into a limited chromatic approximation. This design is not capable of reproducing spectral phenomena such as dispersion, spectral highlights or detailed material spectral reflectance. Furthermore, under non-white or narrow-band illumination, conventional NeRF breaks down because its output is fixed to white-light RGB interpretation.

SpectralNeRF addresses these deficiencies by adopting a spectral decomposition for the radiance field. This allows for more physically based rendering workflows and better genericization to arbitrary spectral power distributions, as well as post-hoc relighting.

2. Theoretical Framework

SpectralNeRF extends the volumetric radiance field to capture continuous spectral information:

The model learns $L(x, d, \lambda) = F_\theta(x, d, \lambda)$ for 3D position $x \in \mathbb{R}^3$ , view direction $d \in S^2$ , and wavelength $\lambda \in [\lambda_{min}, \lambda_{max}]$ .
In practice, the visible spectrum is discretized (e.g., $N = 11$ bands across $380{-}780\text{nm}$ ).
For a ray $r(t) = o + t d$ , the backbone SpectralMLP produces both the density $\sigma(r(t))$ and a set of per-band “spectrum maps” $s_{\lambda_i}(r(t), d)$ .
Standard volumetric rendering is performed independently for each spectral band:

$\hat{S}_{\lambda_i}(r) = \int_{t_n}^{t_f} T(t) \cdot \sigma(r(t)) \cdot s_{\lambda_i}(r(t), d) dt$

where $T(t) = \exp\left(-\int_{t_n}^t \sigma(r(p)) dp\right)$ .

The final RGB output is produced by integrating the rendered spectrum maps using CIE color-matching conversion:

$C_{rgb}(r) = \int_{\lambda_{min}}^{\lambda_{max}} T(\lambda) L_{rgb}(r, \lambda) d\lambda$

or discretely, $C_{rgb}(r) \approx \sum_{i=1}^N w_i \hat{S}_{\lambda_i}(r)$ , with $w_i$ as precomputed quadrature weights incorporating delta bandwidth $\Delta \lambda$ and CIE color-matching functions.

3. Network Architecture

SpectralMLP

Input: $\gamma(x), \gamma(d)$ (high-frequency positional encoding).
Backbone: 8 fully connected layers (256 units, ReLU).
Output heads:
- Density $\sigma \in \mathbb{R}$ (shared across $\lambda$ ).
- Radiance for all $N$ bands: $[s_{\lambda_1}, ..., s_{\lambda_N}]$ where each $s_{\lambda_i} \in \mathbb{R}^3$ .

Spectrum Attention UNet (SAUNet)

Purpose: fuse the $N$ spectrum maps into a high-quality RGB image.
Structure: 3-level encoder-decoder U-Net with skip connections and Attention Gates.
Spectrum Attention module:
- 1×1 convolution layers for band reweighting.
- Squeeze-and-Excitation channel-attention for modeling inter-band dependencies.
- Aggregated features are concatenated and downsampled.
Final output: 1×1 convolution in the decoder yields the 3-channel RGB.

Pipeline: $(o, d) \to$ SpectralMLP $\to$ per-band radiance + $\sigma$ $\to$ volume rendering $\to N$ spectrum maps $\to$ SAUNet $\to$ RGB image.

4. Training Methodology

Two primary loss terms are used:

Weighted Spectrum Map Reconstruction:

$L_{spectral} = \sum_{i=1}^N w_s(\lambda_i) \left[ \sum_{r \in R(P)} \| \hat{S}^c_{\lambda_i}(r) - S_{\lambda_i}(r) \|_2^2 + \| \hat{S}^f_{\lambda_i}(r) - S_{\lambda_i}(r) \|_2^2 \right]$

where $w_s(\lambda_i) = 2^{P_{max}/P_{\lambda_i}}$ adjusts the importance of each band.

RGB Reconstruction:

$L_{RGB} = \sum_{r} \| \hat{C}(r) - C_{gt}(r) \|_2^2$

Total loss: $L = L_{spectral} + \lambda_{RGB} L_{RGB}$ with $\lambda_{RGB} = 1.1$ .

Optimization: Adam optimizer (SpectralMLP lr $= 5 \times 10^{-4}$ , SAUNet lr $= 10^{-3}$ ), 64 coarse + 128 fine points sampled per ray.

5. Experimental Validation

Quantitative Results:

Dataset/Scene	Metric	RGB NeRF	SpectralNeRF	Improvement
Synthetic, avg (8)	PSNR	–	+1–2 dB	+1–2 dB
	SSIM	–	+0.01–0.02	+0.01–0.02
	LPIPS	–	−0.01–0.02	Lower LPIPS
	L1	–	−0.3–0.5	Lower L1
Real Projector	PSNR	–	+3 dB	+3 dB
Real Dog Doll	PSNR	–	+2.6 dB	+2.6 dB

Qualitative outcomes:

SpectralNeRF recovers sharper highlights and textures, accurate dispersion effects, and avoids fogging under colored lighting.
Ablations: removing spectral decomposition ( $N = 0$ ) reduces PSNR by $1.8$ dB; SAUNet delivers another $2$ dB boost over naive fusion; spectral weighting $w_s$ yields $+0.1$ dB.

6. Significance and Extensions

SpectralNeRF enables physically correct rendering effects such as chromatic dispersion and spectral shadowing. The spectral approach simplifies scene modeling, improves NeRF performance in difficult domains, and supports relighting under arbitrary spectral illumination conditions. The pipeline is compatible with novel applications, including post-hoc scene relighting and spectral super-resolution.

Limitations include increased inference cost (about $2\times$ for $N\approx 11$ vs RGB NeRF), static scene restriction, and current reliance on uniform band sampling. Future work includes learning sparse bands, integrating spectral basis functions, dynamic illumination modeling, and per-voxel BRDF integration for time-varying spectral effects (Li et al., 2023).

SpectralNeRF shares conceptual ground with other multispectral and hyperspectral NeRF variants:

Cross-Spectral NeRFs (X-NeRF, (Poggi et al., 2022)) target joint modeling using heterogeneous camera inputs (RGB/MS/IR), using shared density and modality-specific radiance vectors.
Spec-NeRF (Li et al., 2023), UnMix-NeRF (Perez et al., 27 Jun 2025), Multispectral-NeRF (Zhang et al., 14 Sep 2025), and Hyperspectral NeRF (Chen et al., 2024) each extend NeRF to richer spectral representations and address material property recovery, sensor simulation, and spectral segmentation.
SpectralNeRF is distinguished by its explicit physically-based rendering pipeline, attention-based spectral fusion, and weighted spectral loss objectives.

This suggests that spectral extensions of NeRF architectures constitute a robust paradigm for high-fidelity, physically grounded rendering and analysis of multispectral and hyperspectral scenes.