Papers
Topics
Authors
Recent
Search
2000 character limit reached

4D Spherindrical Harmonics

Updated 27 January 2026
  • 4D spherindrical harmonics are an orthonormal function basis that jointly encode view-dependent appearance and temporal evolution for dynamic scene synthesis.
  • They utilize tensor products of spherical harmonics and Fourier bases to decompose radiance fields, enabling efficient, real-time rendering over spatio-temporal volumes.
  • Practical applications include AR/VR and surgical imaging, achieving high frame rates (e.g., >100 FPS on RTX4090) with compact, high-fidelity scene reconstructions.

4D spherindrical harmonics are an orthonormal function basis designed for the joint representation of view-dependent appearance and temporal evolution in spatio-temporal volumetric models. These bases are foundational to the 4D Gaussian Splatting (4DGS) framework for dynamic scene modeling, where each 4D primitive carries a compact decomposition of radiance over the product space of spherical view directions and periodic time. By generalizing both standard spherical and cylindrical (Fourier) harmonics, 4D spherindrical harmonics enable real-time, photorealistic rendering and efficient learning for dynamic scene synthesis, applicable across domains from AR/VR to surgical imaging (Yang et al., 2024, Yang et al., 2023, Li et al., 2024).

1. Mathematical Structure and Properties

Let S2S^2 denote the unit sphere (parameterizing view directions with spherical angles (θ,ϕ)(\theta, \phi)), and S1S^1 the unit circle in time, t[0,T]t \in [0,T]. The domain is D=S2×S1D = S^2 \times S^1, referred to as “spherindrical” coordinates. The 4D spherindrical basis functions, denoted Znm(t,θ,ϕ)Z_{n\ell}^{m}(t, \theta, \phi), are defined as tensor products of (real or complex) spherical harmonics and a Fourier basis in time:

  • For n=0n=0:

Z0m(t,θ,ϕ)=1TYm(θ,ϕ)Z_{0\ell}^{m}(t,\theta,\phi) = \frac{1}{\sqrt{T}} Y_\ell^m(\theta,\phi)

  • For n>0n>0:

Znm(t,θ,ϕ)=2Tcos(2πntT)Ym(θ,ϕ)Z_{n\ell}^{m}(t,\theta,\phi) = \sqrt{\frac{2}{T}} \cos\left(\frac{2\pi n t}{T}\right) Y_\ell^m(\theta,\phi)

Znm(t,θ,ϕ)=2Tsin(2πntT)Ym(θ,ϕ)Z_{-n\ell}^{m}(t,\theta,\phi) = \sqrt{\frac{2}{T}} \sin\left(\frac{2\pi n t}{T}\right) Y_\ell^m(\theta,\phi)

where YmY_\ell^m are the spherical harmonics of degree \ell and order mm. The basis is orthonormal under the product measure dtdΩdt\,d\Omega:

0TS2Zp(t,θ,ϕ)Zq(t,θ,ϕ)dΩdt=δpq\int_0^T \int_{S^2} Z_{p}(t, \theta, \phi) Z_{q}(t, \theta, \phi)\, d\Omega\, dt = \delta_{pq}

where p,qp, q index (±n,,m)(\pm n, \ell, m) (Yang et al., 2024, Yang et al., 2023).

This basis generalizes classical expansions:

  • With only n=0n=0, recovers 3D spherical harmonics.
  • With only =0,m=0\ell=0, m=0, recovers 1D Fourier (cylindrical) harmonics in time.

Spherical harmonics Ym(θ,ϕ)Y_\ell^m(\theta,\phi) themselves satisfy standard recurrence relations and are normalized as:

S2YmYmdΩ=δδmm\int_{S^2} Y_\ell^m Y_{\ell'}^{m'} d\Omega = \delta_{\ell\ell'} \delta_{mm'}

where dΩ=sinθdθdϕd\Omega = \sin\theta\, d\theta\, d\phi (Yang et al., 2024, Yang et al., 2023, Li et al., 2024).

2. Basis Construction and Expansion of Appearance

Given a spatial point xx, view direction v=(θ,ϕ)v = (\theta, \phi), and time tt, the radiance field can be approximated as a sum of per-primitive contributions:

L(x,t,v)i=1Npi(x,t)αici(v,t)L(x, t, v) \approx \sum_{i=1}^N p_i(x, t) \alpha_i c_i(v, t)

where pi(x,t)p_i(x, t) is the spatio-temporal weight (often factored as pi(t)pi(xt)p_i(t) p_i(x|t)), αi\alpha_i is the opacity, and ci(v,t)c_i(v, t) is the local appearance.

For each 4D Gaussian primitive ii, the (view, time)-dependent color expansion is:

ci(v,t)n=NN=0Lm=ai,n,,mZnm(Δt,θ,ϕ)c_i(v, t) \approx \sum_{n=-N}^{N} \sum_{\ell=0}^{L} \sum_{m=-\ell}^{\ell} a_{i, n, \ell, m} Z_{n\ell}^m(\Delta t, \theta, \phi)

where Δt=tμt,i\Delta t = t - \mu_{t,i} (with μt,i\mu_{t,i} the temporal center of the Gaussian), and ai,n,,ma_{i, n, \ell, m} are the learned coefficients associated per-primitive with each basis function (Yang et al., 2024, Li et al., 2024). In other formulations, only nonnegative nn are used and only cosines appear; in others, sines are included for completeness.

The basis is truncated (e.g., L3L\leq 3, N4N\leq 4) to control expressivity and computational cost.

3. Coefficient Fitting and Learning

Coefficients ai,n,,ma_{i, n, \ell, m} are optimized jointly with the 4D Gaussian parameters using photometric and geometric losses. In 4DGS, fitting is performed end-to-end:

  • The rendered image is computed using the spherindrical harmonic expansion for each primitive.
  • The loss

L=pixelsIobs(u,v,t)Irendered(u,v,t;{μi,Σi,αi,ai})2+regularizers\mathcal{L} = \sum_{\text{pixels}} \|I_\text{obs}(u, v, t) - I_\text{rendered}(u, v, t; \{\mu_i, \Sigma_i, \alpha_i, a_i\})\|^2 + \text{regularizers}

is minimized using stochastic gradient methods (Yang et al., 2024, Li et al., 2024).

Alternatively, a per-Gaussian least-squares fit is possible:

(BTWB+λI)ai=BTWr(B^T W B + \lambda I) a_i = B^T W r

where BB is the matrix of basis samples, WW contains weights pi(xs,ts)p_i(x_s, t_s), and rr are the observed color residuals—this is conceptually illustrative, but not used in practice due to integration with the rasterization pipeline (Yang et al., 2024).

In medical imaging settings such as ST-Endo4DGS, the coefficients are optimized alongside phases of the sinusoids to accurately track nonstationary lighting and specular effects. Additional regularization (e.g., normal alignment loss) may be introduced to align geometry and the basis-induced appearance (Li et al., 2024).

4. Integration with 4D Gaussian Splatting Primitives

Each 4D Gaussian is parameterized as a distribution on [x,t]R3×R[x, t] \in \mathbb{R}^3 \times \mathbb{R} with mean μi\mu_i and full 4D covariance Σi=RSSTRT\Sigma_i = R S S^T R^T (where RR represents a combination of 4D rotations and SS diagonal scales) (Yang et al., 2024, Yang et al., 2023). At render time:

  • The 4D Gaussian is marginalized in time to give pi(t)p_i(t), and conditioned on time to yield a 3D spatial Gaussian.
  • The spatial Gaussian is projected into image space, forming a 2D “splat” pi(u,vt)p_i(u, v|t).
  • The spherindrical harmonic coefficients modulate the appearance contribution via the basis functions evaluated at the query (t,θ,ϕ)(t, \theta, \phi).

The covariance Σi\Sigma_i determines spatial and temporal locality—the support in tt affects when each primitive is active in time, and the off-diagonal Σx,t\Sigma_{x,t} components describe spatio-temporal motion (Yang et al., 2024). Thus, as an object undergoes deformation or translation, the same coefficient vector modulates its appearance along its trajectory through spacetime.

Rendering is performed by rasterizing all active Gaussians for the given frame, evaluating the basis at each pixel and compositing the result via weighted blending. Computational optimizations include:

  • Culling primitives with negligible pi(t)p_i(t).
  • Evaluating bases via recurrences and lookups for fast, real-valued computation (no complex arithmetic or deep neural networks in the basis evaluation loop) (Yang et al., 2023, Li et al., 2024).

5. Computational Aspects, Efficiency, and Model Compression

The separability and analyticity of the 4D spherindrical basis enable efficient implementation. Key computational considerations:

  • Truncating \ell (angular order) and nn (temporal order) controls the number of coefficients and computational load. With 3\ell \leq 3 and n4n\leq 4, each Gaussian requires O(L2N)O(L^2 N) coefficients, which is several orders of magnitude smaller than storing a per-frame view-dependent color (Yang et al., 2023, Yang et al., 2024).
  • GPU implementations precompute and cache Ym(θ,ϕ)Y_\ell^m(\theta, \phi) and evaluate temporal sines/cosines by lookup or fast trigonometric recurrence (Li et al., 2024).
  • Vector quantization and Huffman encoding are applied to compress coefficient vectors, reducing memory by up to 95% while preserving rendering quality. Adaptive masking prunes primitives with negligible contribution across all frames (Yang et al., 2024).

The computational pipeline supports real-time rendering speeds (e.g., >100 FPS on RTX4090-class GPUs) even in settings with highly dynamic geometry, as demonstrated in endoscopic scene reconstruction (Li et al., 2024).

6. Theory, Limitations, and Extensions

The principal theoretical advantages of 4D spherindrical harmonics in dynamic scene modeling include:

  • Full separability enables analytic orthogonality, rapid evaluation, and stable learning.
  • The basis compactly encodes both view- and time-dependence, outperforming frame-wise or non-separable methods in memory and speed (Yang et al., 2024, Yang et al., 2023).

Limitations arise from:

  • The finite support of each Gaussian restricts local expressivity; rapidly changing nonperiodic or high-frequency temporal phenomena may require higher NN, more primitives, or alternative temporal bases.
  • Using only cosine terms (for n>0n>0) biases models toward even temporal symmetry; inclusion of sine terms addresses this, at a slight cost in complexity (Yang et al., 2024).
  • For scenes with highly non-periodic or transient effects, Fourier time bases may be suboptimal; polynomials or wavelets offer alternatives (Yang et al., 2024).
  • Splatting-based rasterization imposes practical constraints on the spatial overlap and number of active primitives per frame (Li et al., 2024).

Extensions include learned nonlinear warps of the temporal offset Δt\Delta t, low-rank priors on the coefficient tensors to regularize overparameterization, and end-to-end sharing of attributes across groups of Gaussians via neural predictors (“anchor-based” structured variants) (Yang et al., 2024).

7. Applications and Comparative Benefits

The adoption of 4D spherindrical harmonics in dynamic scene synthesis enables real-time, photorealistic rendering, temporally coherent animation, and efficient parameter learning across a variety of challenging scene types:

  • Large-scale scene modeling for AR/VR and metaverse applications, as in 4DGS (Yang et al., 2024).
  • Real-time photorealistic novel view generation over arbitrary time in both synthetic and real-world datasets (Yang et al., 2023).
  • Dynamic medical imaging, e.g., endoscopic surgical scene reconstruction, with robust handling of deformable and view-dependent appearance under complex lighting (Li et al., 2024).

Empirical results demonstrate superiority over static-appearance and per-frame neural field baselines in visual quality, computational efficiency, and storage compactness. The analytic, lightweight basis supports deployment in real-world systems demanding high frame rates and scalability (Yang et al., 2024, Yang et al., 2023, Li et al., 2024).

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to 4D Spherindrical Harmonics.