4D Spherindrical Harmonics
- 4D spherindrical harmonics are an orthonormal function basis that jointly encode view-dependent appearance and temporal evolution for dynamic scene synthesis.
- They utilize tensor products of spherical harmonics and Fourier bases to decompose radiance fields, enabling efficient, real-time rendering over spatio-temporal volumes.
- Practical applications include AR/VR and surgical imaging, achieving high frame rates (e.g., >100 FPS on RTX4090) with compact, high-fidelity scene reconstructions.
4D spherindrical harmonics are an orthonormal function basis designed for the joint representation of view-dependent appearance and temporal evolution in spatio-temporal volumetric models. These bases are foundational to the 4D Gaussian Splatting (4DGS) framework for dynamic scene modeling, where each 4D primitive carries a compact decomposition of radiance over the product space of spherical view directions and periodic time. By generalizing both standard spherical and cylindrical (Fourier) harmonics, 4D spherindrical harmonics enable real-time, photorealistic rendering and efficient learning for dynamic scene synthesis, applicable across domains from AR/VR to surgical imaging (Yang et al., 2024, Yang et al., 2023, Li et al., 2024).
1. Mathematical Structure and Properties
Let denote the unit sphere (parameterizing view directions with spherical angles ), and the unit circle in time, . The domain is , referred to as “spherindrical” coordinates. The 4D spherindrical basis functions, denoted , are defined as tensor products of (real or complex) spherical harmonics and a Fourier basis in time:
- For :
- For :
where are the spherical harmonics of degree and order . The basis is orthonormal under the product measure :
where index (Yang et al., 2024, Yang et al., 2023).
This basis generalizes classical expansions:
- With only , recovers 3D spherical harmonics.
- With only , recovers 1D Fourier (cylindrical) harmonics in time.
Spherical harmonics themselves satisfy standard recurrence relations and are normalized as:
where (Yang et al., 2024, Yang et al., 2023, Li et al., 2024).
2. Basis Construction and Expansion of Appearance
Given a spatial point , view direction , and time , the radiance field can be approximated as a sum of per-primitive contributions:
where is the spatio-temporal weight (often factored as ), is the opacity, and is the local appearance.
For each 4D Gaussian primitive , the (view, time)-dependent color expansion is:
where (with the temporal center of the Gaussian), and are the learned coefficients associated per-primitive with each basis function (Yang et al., 2024, Li et al., 2024). In other formulations, only nonnegative are used and only cosines appear; in others, sines are included for completeness.
The basis is truncated (e.g., , ) to control expressivity and computational cost.
3. Coefficient Fitting and Learning
Coefficients are optimized jointly with the 4D Gaussian parameters using photometric and geometric losses. In 4DGS, fitting is performed end-to-end:
- The rendered image is computed using the spherindrical harmonic expansion for each primitive.
- The loss
is minimized using stochastic gradient methods (Yang et al., 2024, Li et al., 2024).
Alternatively, a per-Gaussian least-squares fit is possible:
where is the matrix of basis samples, contains weights , and are the observed color residuals—this is conceptually illustrative, but not used in practice due to integration with the rasterization pipeline (Yang et al., 2024).
In medical imaging settings such as ST-Endo4DGS, the coefficients are optimized alongside phases of the sinusoids to accurately track nonstationary lighting and specular effects. Additional regularization (e.g., normal alignment loss) may be introduced to align geometry and the basis-induced appearance (Li et al., 2024).
4. Integration with 4D Gaussian Splatting Primitives
Each 4D Gaussian is parameterized as a distribution on with mean and full 4D covariance (where represents a combination of 4D rotations and diagonal scales) (Yang et al., 2024, Yang et al., 2023). At render time:
- The 4D Gaussian is marginalized in time to give , and conditioned on time to yield a 3D spatial Gaussian.
- The spatial Gaussian is projected into image space, forming a 2D “splat” .
- The spherindrical harmonic coefficients modulate the appearance contribution via the basis functions evaluated at the query .
The covariance determines spatial and temporal locality—the support in affects when each primitive is active in time, and the off-diagonal components describe spatio-temporal motion (Yang et al., 2024). Thus, as an object undergoes deformation or translation, the same coefficient vector modulates its appearance along its trajectory through spacetime.
Rendering is performed by rasterizing all active Gaussians for the given frame, evaluating the basis at each pixel and compositing the result via weighted blending. Computational optimizations include:
- Culling primitives with negligible .
- Evaluating bases via recurrences and lookups for fast, real-valued computation (no complex arithmetic or deep neural networks in the basis evaluation loop) (Yang et al., 2023, Li et al., 2024).
5. Computational Aspects, Efficiency, and Model Compression
The separability and analyticity of the 4D spherindrical basis enable efficient implementation. Key computational considerations:
- Truncating (angular order) and (temporal order) controls the number of coefficients and computational load. With and , each Gaussian requires coefficients, which is several orders of magnitude smaller than storing a per-frame view-dependent color (Yang et al., 2023, Yang et al., 2024).
- GPU implementations precompute and cache and evaluate temporal sines/cosines by lookup or fast trigonometric recurrence (Li et al., 2024).
- Vector quantization and Huffman encoding are applied to compress coefficient vectors, reducing memory by up to 95% while preserving rendering quality. Adaptive masking prunes primitives with negligible contribution across all frames (Yang et al., 2024).
The computational pipeline supports real-time rendering speeds (e.g., >100 FPS on RTX4090-class GPUs) even in settings with highly dynamic geometry, as demonstrated in endoscopic scene reconstruction (Li et al., 2024).
6. Theory, Limitations, and Extensions
The principal theoretical advantages of 4D spherindrical harmonics in dynamic scene modeling include:
- Full separability enables analytic orthogonality, rapid evaluation, and stable learning.
- The basis compactly encodes both view- and time-dependence, outperforming frame-wise or non-separable methods in memory and speed (Yang et al., 2024, Yang et al., 2023).
Limitations arise from:
- The finite support of each Gaussian restricts local expressivity; rapidly changing nonperiodic or high-frequency temporal phenomena may require higher , more primitives, or alternative temporal bases.
- Using only cosine terms (for ) biases models toward even temporal symmetry; inclusion of sine terms addresses this, at a slight cost in complexity (Yang et al., 2024).
- For scenes with highly non-periodic or transient effects, Fourier time bases may be suboptimal; polynomials or wavelets offer alternatives (Yang et al., 2024).
- Splatting-based rasterization imposes practical constraints on the spatial overlap and number of active primitives per frame (Li et al., 2024).
Extensions include learned nonlinear warps of the temporal offset , low-rank priors on the coefficient tensors to regularize overparameterization, and end-to-end sharing of attributes across groups of Gaussians via neural predictors (“anchor-based” structured variants) (Yang et al., 2024).
7. Applications and Comparative Benefits
The adoption of 4D spherindrical harmonics in dynamic scene synthesis enables real-time, photorealistic rendering, temporally coherent animation, and efficient parameter learning across a variety of challenging scene types:
- Large-scale scene modeling for AR/VR and metaverse applications, as in 4DGS (Yang et al., 2024).
- Real-time photorealistic novel view generation over arbitrary time in both synthetic and real-world datasets (Yang et al., 2023).
- Dynamic medical imaging, e.g., endoscopic surgical scene reconstruction, with robust handling of deformable and view-dependent appearance under complex lighting (Li et al., 2024).
Empirical results demonstrate superiority over static-appearance and per-frame neural field baselines in visual quality, computational efficiency, and storage compactness. The analytic, lightweight basis supports deployment in real-world systems demanding high frame rates and scalability (Yang et al., 2024, Yang et al., 2023, Li et al., 2024).