Relightable Holoported Character (RHC)

Updated 30 January 2026

Relightable Holoported Characters are digital avatars that are fully animatable, relightable, and capable of dynamic full-body rendering from minimal RGB camera inputs.
They employ advanced neural representations, 3D Gaussian splatting, and transformer-based relighting architectures to deliver high-fidelity shading, specular highlights, and geometric accuracy.
Practical systems integrate scalable capture setups and real-time body tracking with physics-informed loss functions to enable interactive telepresence and adaptive lighting estimations.

A Relightable Holoported Character (RHC) is a person-specific, animatable, relightable digital avatar capable of full-body dynamic rendering under arbitrary viewpoints and novel lighting conditions, suitable for real-time telepresence. RHC systems employ advanced neural representations, physically based reflectance models, and sparse-view capture pipelines to achieve high-fidelity lighting reproduction and geometric accuracy from limited RGB camera input. The technology builds on key advances in neural rendering, 3D Gaussian splatting, articulated mesh priors, and transformer-based relighting architectures, resulting in avatars that exhibit realistic specular highlights, subsurface scattering, dynamic self-shadowing, and cloth deformation—without the need for laborious one-light-at-a-time (OLAT) light-stage acquisition.

1. Capture Setup and Data Acquisition

RHC systems are designed to function under practical capture constraints, eschewing intensive OLAT protocols in favor of scalable, sparse multi-view setups. Modern approaches utilize programmable lightstages with 40 synchronized high-resolution cameras and 331 independently operated LEDs, alternating between "random environment map" illumination (simulating 1 015 real HDR environment maps) and uniformly lit sequences for robust mesh tracking (Singh et al., 29 Nov 2025). Such data collection enables the simultaneous learning of subject motion, surface normals, and appearance under diverse lighting for each time frame.

Earlier methods required monocular or multi-view camera arrays (8–16 channels) with natural or static scene illumination and relied on SMPL/SMPL-X or FLAME shape-pose models for initial mesh estimation (Chen et al., 2022, Zhang et al., 11 Mar 2025). Preprocessing includes background removal, camera pose calibration (e.g., via COLMAP), and per-frame body fitting, enabling downstream neural field queries or Gaussian initialization.

2. Model Representations and Neural Architectures

Mesh and Latent Conditioning

All RHC variants leverage coarse mesh proxies (6890–10 475 vertices for SMPL/SMPL-X or 5 143 for FLAME), augmented by per-vertex (body) or per-point (head) latent codes that encode dynamic appearance attributes. These latent feature volumes (e.g., $Z \in \mathbb{R}^{N \times 16}$ ) are interpolated using trilinear schemes and passed as input to downstream network modules (Chen et al., 2022, Iqbal et al., 2022, Zhang et al., 11 Mar 2025).

Neural Fields and Gaussian Splatting

Fully relightable full-body avatars are implemented as neural fields—MLPs parameterize density, normal, occlusion, diffuse albedo, and specular lobe maps—with 4–5 layers of 256 channels each, optionally conditioned on position, viewing direction, and latent features (Chen et al., 2022). Head avatars utilize explicit 3D Gaussian splatting representations, where each Gaussian carries learnable blendshape, skinning, position, scale, orientation, opacity, and color attributes. HRAvatar and RelightAnyone refine this paradigm by inferring physical appearance maps (albedo, normal, roughness, reflectance) and employing multi-stage decoders for lighting code disentanglement (Zhang et al., 11 Mar 2025, Xu et al., 6 Jan 2026).

Relighting Networks

Recent models such as RelightNet (Singh et al., 29 Nov 2025) employ U-Net backbones with self- and cross-attention mechanisms, consuming physics-informed feature stacks (mesh normals, high-frequency image normals, position maps, refined albedo, pre-integrated shading, view encodings) and HDR environment maps embedded via sinusoidal positional encodings. Output is rendered as per-texel 3D Gaussian splats, posed into world space and efficiently composited via sorted alpha blending.

3. Reflectance Modeling and Rendering Equations

RHC frameworks are grounded in the rendering equation:

$L_o(x, \omega_o) = \int_{\Omega} f_r(x, \omega_i, \omega_o) L_i(x, \omega_i) (n \cdot \omega_i)\, d\omega_i$

where $f_r$ is the BRDF, $L_i$ is the incident radiance, $n$ is the surface normal, and $(n \cdot \omega_i)$ is foreshortening (Singh et al., 29 Nov 2025, Chen et al., 2022). Practical systems approximate this via per-pixel discrete summation or spherical harmonics. RHC architectures incorporate:

Microfacet BRDFs with GGX/Beckmann normal distributions, Schlick Fresnel term, and Smith shadow-masking (full-body) (Chen et al., 2022, Zhang et al., 11 Mar 2025).
Spherical harmonic lighting models ( $c_{\ell m}$ coefficients, SH basis up to $\ell=2$ ) for fast, low-frequency relighting (Iqbal et al., 2022).
Spherical Gaussian lobes for specular components and SH transfer for diffuse layers (Xu et al., 6 Jan 2026).
Physically inspired input maps reflecting geometry, albedo, shading, and view (Singh et al., 29 Nov 2025).

RelightNet leverages transformer cross-attention between feature tokens and environment map embeddings to implicitly reproduce the high-dimensional illumination integral per texel (Singh et al., 29 Nov 2025).

4. Learning Strategy, Losses, and Optimization

RHC training is multi-stage and subject-specific. Losses include:

Photometric reconstruction: $L_{\text{rgb}} = \sum |C_{\text{pred}} - C_{\text{gt}}|$ or SSIM/LPIPS (Singh et al., 29 Nov 2025, Chen et al., 2022, Zhang et al., 11 Mar 2025, Xu et al., 6 Jan 2026).
Geometry priors: $||V - \tilde V||^2$ , $||n - \tilde n||^2$ , smoothness, and temporal coherence (Chen et al., 2022).
Albedo smoothness and minimum-entropy via KDE: $H_A = E_{x}[ - \log \hat p(A(x)) ]$ (Chen et al., 2022).
Gaussian parameter regularizers and residual constraints (Singh et al., 29 Nov 2025).
Adversarial (GAN), VGG perceptual losses, body masking, and albedo/normal consistency terms (Iqbal et al., 2022, Zhang et al., 11 Mar 2025).

Typical optimization uses Adam with progressive learning rates (e.g., $5\times10^{-4}$ to $1\times10^{-5}$ ), 260 K–360 K iterations, batch sizes of 4–16, and GPU acceleration (e.g., H100 or V100) (Singh et al., 29 Nov 2025, Chen et al., 2022). Synthetic pretraining on large human datasets accelerates personalizable adaptation and regularizes texture/lighting disentanglement (Iqbal et al., 2022).

5. Experimental Evaluation and Performance Metrics

Comprehensive benchmarks across synthetic (BlenderHuman, RenderPeople, INSTA, HDTF) and real (People-Snapshot, ZJU-Mocap, Ava-256, SDFM) datasets demonstrate the superiority of RHC approaches over prior methods. Key metrics include PSNR, SSIM, LPIPS (image/albedo/normal), and angular error in degrees. Table 1 below summarizes full-body and head avatar results.

Method	PSNR (Relight)	SSIM	LPIPS	Normal Err (°)	PSNR (Albedo)
Relighting4D	26.15	0.912	0.164	32.18	28.95
RHC (RelightNet)	~32.00	>0.92	~0.05	—	—
HRAvatar	30.36	0.948	0.0569	—	—
RelightAnyone	30.06	0.87	0.2358	—	—
RANA	22.34	0.842	0.173	62.82	24.72

RHC and RelightNet achieve qualitative and quantitative improvements in realistic shading (cloth wrinkles, skin specularities, self-shadowing) and generalize to unseen environment maps and non-Lambertian materials (Singh et al., 29 Nov 2025, Chen et al., 2022). HRAvatar and RelightAnyone demonstrate high-fidelity relighting from minimal (even single image) input (Zhang et al., 11 Mar 2025, Xu et al., 6 Jan 2026).

6. Real-Time Holoportation Pipeline Adaptation

Deploying RHC for live telepresence requires adaptation for instant inference and dynamic lighting estimation. Key enhancements for holoportation include:

Real-time body tracking substitutes offline SMPL/FLAME mesh extraction (e.g., Kinect + real-time pose estimation) (Chen et al., 2022).
Continuous latent code update per frame and sliding window adaptation (Chen et al., 2022).
Implementation of field pre-baking, distillation into tiny MLP trees (e.g., KiloNeRF), or lightweight U-Net architectures for fast rendering (Chen et al., 2022, Singh et al., 29 Nov 2025).
GPU kernel-based volume integration and fast Gaussian splatting pipelines yield interactive rendering rates (2 FPS for full-body, 150+ FPS for head-only systems) (Singh et al., 29 Nov 2025, Zhang et al., 11 Mar 2025).
AR/VR integration via device-provided environment maps or user-selected HDRIs, allowing instantaneous lighting updates (Chen et al., 2022).
Live lighting estimation: light probe inference in under 1 second, or SH estimation using external networks (Chen et al., 2022).

7. Limitations, Ablations, and Research Directions

Current RHC technologies display several constraints:

Identity specificity: models are trained per subject; cross-identity generalization requires large-scale generative priors (Singh et al., 29 Nov 2025, Xu et al., 6 Jan 2026).
Clothing topology: coarse meshes cannot capture abrupt changes (jackets, translucent materials, glasses), resulting in localization or rendering artifacts (Singh et al., 29 Nov 2025, Xu et al., 6 Jan 2026, Iqbal et al., 2022).
Hair and accessories: head models struggle with loose hair, hats, or occlusions due to UV and prior limitations (Zhang et al., 11 Mar 2025, Xu et al., 6 Jan 2026).
Expression dynamism: RelightAnyone is currently limited to neutral expressions—future work should address blendshape and emotion coding (Xu et al., 6 Jan 2026).
Rendering speed: full-body relighting attains ~2 FPS; optimizations using CUDA ray tracing or neural distillation are proposed for real-time deployment (Singh et al., 29 Nov 2025).
Relighting physics: some methods assume Lambertian surface reflectance and omit cast shadows or high-frequency light phenomena (Iqbal et al., 2022).

Ablation studies confirm the essentiality of each physics-informed input feature (geometry, albedo, shading, view, cross-attention), as removal correlates with marked drops in PSNR and increases in LPIPS (Singh et al., 29 Nov 2025). Proposed future directions include universal subject-agnostic pretraining, explicit layered clothing, translucent material priors, and higher-frequency BRDF modeling (Singh et al., 29 Nov 2025, Xu et al., 6 Jan 2026). Ethical deployment, particularly concerning identity protection in telepresence, remains a key concern (Zhang et al., 11 Mar 2025).