UV-Guided Avatar Modeling

Updated 19 January 2026

UV-guided avatar modeling is a technique that maps digital humans into a 2D UV space, unifying geometry, appearance, and deformation for efficient reconstruction.
It applies methods like Gaussian splatting, latent feature decoding, and mesh deformation to represent and animate avatars with high fidelity.
The framework enables precise relighting, semantic editing, and integration with 2D CNNs, though it faces challenges with template registration and complex garments.

UV-guided avatar modeling denotes a set of computational methodologies for reconstructing, representing, and rendering animatable digital human avatars in which the canonical parameterization or primary flow of information passes through a two-dimensional UV space. Here, “UV” refers to the flattened surface coordinates of a template mesh, typically derived via a parameterization of the neutral-pose human body or head surface. This representation unifies geometry, appearance, and deformation factors in a manner that is both memory efficient and conducive for leveraging 2D convolutional architectures and physically-based texture-based workflows. UV-guided modeling underpins current state-of-the-art avatar systems based on Gaussian splatting, neural point-based volumetrics, and generative diffusion models, and it has enabled real-time photorealistic rendering, relightability, and semantically meaningful editing. Key research examples include TeGA (Li et al., 8 May 2025), UV Gaussians (Jiang et al., 2024), $E^3$ Gen (Zhang et al., 2024), GUAVA (Zhang et al., 6 May 2025), GTAvatar (Baert et al., 9 Dec 2025), and PGHM (Peng et al., 7 Jun 2025).

1. UV Parameterization and Canonical Embedding

In UV-guided avatar modeling, a neutral-pose template mesh (e.g., FLAME, SMPL-X, EHM) is unwrapped into a 2D UV domain $U\times V\subset[0,1]^2$ . Each vertex or texel in this space corresponds to a well-defined 3D location on the canonical template mesh. UV parameterization ensures that multi-view observations, appearance textures, and geometric attributes from diverse input modalities are referenced and sampled consistently regardless of the pose, viewpoint, or subject.

For instance, in TeGA (Li et al., 8 May 2025), all Gaussians' canonical mean parameters are defined in a continuous UVD space, where U and V index the UV coordinates and D corresponds to displacement along the local template normal. Similarly, in UV Gaussians (Jiang et al., 2024), a high-resolution UV atlas provides a 1-1 mapping from ( $u$ , $v$ ) to 3D world coordinates, enabling each Gaussian or neural point to be indexed or emitted from UV space. GUAVA (Zhang et al., 6 May 2025) uses barycentric interpolation within each template mesh triangle to associate every UV pixel with a 3D canonical location.

This UV-centric definition enables compact representation, straightforward correspondence across views, and forms the basis for learning high fidelity geometry and texture embeddings through 2D convolutional networks.

2. Geometry and Appearance Encoding

Geometry and appearance attributes are encoded in UV space and subsequently lifted to 3D via learned or deterministic mappings. Several distinct schemes are used across the literature:

Gaussian Splatting: Most modern systems parameterize avatar geometry as a dense set of 3D Gaussian primitives whose means, covariances, and colors are predicted from UV coordinates themselves (e.g., via a UV-guided U-Net or decoder). TeGA (Li et al., 8 May 2025) stores all canonical Gaussian parameters in UVD space; UV Gaussians (Jiang et al., 2024) predicts residual offsets, per-Gaussian color, scale, and orientation from the output of a Gaussian U-Net operating in the UV atlas.
Latent Feature or Texture Planes: $E^3$ Gen (Zhang et al., 2024) encodes the full avatar as latent feature planes in UV space, partitioned into geometry and appearance slices, which are then decoded to Gaussian attributes using CNNs. PGHM (Peng et al., 7 Jun 2025) uses a learnable UV-aligned identity tensor per subject, from which per-Gaussian identity codes are bilinearly sampled and fused with pose and view encodings.
Materials and Relightability: GTAvatar (Baert et al., 9 Dec 2025) replaces per-Gaussian color with continuous UV-space material maps for albedo, roughness, specularity, and normal, enabling physically-based rendering and robust edits. At render time, each Gaussian samples these UV-embedded textures based on its precise local patch parameters.

UV-space encoding facilitates the exploitation of 2D convolutional architectures, speeding training and inference while preserving high-frequency detail and enabling feature sharing across semantically corresponding regions.

3. Animation and Deformation

Avatar motion, expression, and animation are handled by mapping attributes from the canonical UV (and UVD) domain to the posed/emergent geometry through explicit deformation fields or skeletal skinning:

Template-driven Deformation: Linear Blend Skinning (LBS) applied to the underlying parametric mesh propagates UV-based attribute changes to the posed surface in both UV Gaussians (Jiang et al., 2024) and $E^3$ Gen (Zhang et al., 2024). In TeGA (Li et al., 8 May 2025), a learned UVD residual deformation field predicts per-Gaussian world-space displacements, capturing fine-scale wrinkles and furrows that the base mesh cannot represent.
Part-aware Warping: $E^3$ Gen (Zhang et al., 2024) employs a part-aware deformation module, partitioning Gaussians by body region (face, hands, body) via a fixed segmentation in UV space, and applies different skinning or deformation networks for each region to enhance expressiveness.
Mesh Deformation Networks: In UV Gaussians (Jiang et al., 2024), an independent 2D Mesh U-Net predicts per-texel mesh deformations in UV, improving pose generalization and supporting robust articulation without sacrificing correspondence.
Inverse Texture Mapping and Retargeting: GUAVA (Zhang et al., 6 May 2025) utilizes “inverse texture mapping”: screen-space features are pulled back into the UV atlas via 3D geometry and projection matrices, so that appearance predictions remain consistent under changing pose and viewpoint.

These deformation strategies maintain the UV-to-surface mapping's semantic consistency, allowing robust animation while preserving appearance and geometric detail.

4. Rendering and Compositing

Rendering in UV-guided Gaussian avatar frameworks generally consists of projecting 3D Gaussians or neural points into the camera view, compositing them via alpha blending, and/or reconstructing deferred buffers for subsequent physically-based shading:

2D Splatting and Volume Rendering: Each Gaussian primitive is projected into screen space, where its parameters and color (potentially view-dependent or sampled from UV-space material maps) contribute to pixel intensity by elliptical kernel convolution and over-compositing (Li et al., 8 May 2025, Jiang et al., 2024, Baert et al., 9 Dec 2025).
Physically Based Shading: GTAvatar (Baert et al., 9 Dec 2025) applies deferred Cook–Torrance microfacet BRDF shading to G-buffers accumulated from splatted Gaussian contributions, leveraging the UV albedo, roughness, and normal atlases.
Feature Space Splatting + Refinement: GUAVA (Zhang et al., 6 May 2025) not only applies 3DGS to output a coarse RGB image but also generates a feature map splatted from both template and UV Gaussians. This intermediate is passed to a neural refiner (StyleUNet) to boost fidelity on facial and hand regions.
View-Conditioned Rendering: PGHM (Peng et al., 7 Jun 2025) and related works facilitate view-dependent color shifts by fusing view encodings with UV-mapped features. Attribute decoders are conditioned to predict view-augmented color offsets, supporting realistic specular and lighting effects.

Alpha compositing, G-buffer accumulation, and UV-based deferred shading ensure real-time performance (often >20–100 FPS on commodity GPUs), continuous appearance across overlapping splats, and robust integration with standard graphics pipelines.

5. Training Objectives, Data, and Generalization

UV-guided models employ multi-term, end-to-end differentiable objectives designed to balance geometry, appearance, material consistency, and photometric fidelity:

Photometric and Perceptual Losses: Per-pixel L1, SSIM, and perceptual losses (e.g., VGG-based LPIPS) drive color faithfulness to ground truth imagery (Li et al., 8 May 2025, Jiang et al., 2024, Baert et al., 9 Dec 2025).
Mesh and Deformation Supervision: Losses enforce the proximity of predicted mesh and/or Gaussian centers to high-fidelity registered scans (Jiang et al., 2024, Zhang et al., 6 May 2025).
UV and Material Regularization: GTAvatar (Baert et al., 9 Dec 2025) optimizes for low UV distortion (Eq. 11), smoothness and physical plausibility of PBR materials, and normal/tangency consistency across the surface.
Diffusion and Generative Losses: $E^3$ Gen (Zhang et al., 2024) combines diffusion denoising objectives on the generative UV latent plane with image-space rendering losses.
Ablation Insights: Across works, removal of UV-guided structure leads to decreased texture fidelity, blurry high-frequency details, or instability in articulation (Jiang et al., 2024, Li et al., 8 May 2025, Zhang et al., 6 May 2025). The incorporation of mesh or UV priors is consistently shown to improve rendering quality and generalization to novel poses or views.

UV guidance allows data efficiency and supports rapid subject adaptation (e.g., ~0.1 s in GUAVA (Zhang et al., 6 May 2025), ~20 min in PGHM (Peng et al., 7 Jun 2025)) versus the hours or days required for purely volumetric optimization methods.

6. Relightability, Editability, and Semantic Control

UV-guided representations enable editability and advanced physical simulation capabilities not achievable by purely unstructured Gaussian or neural point clouds:

Editable UV Atlases: GTAvatar (Baert et al., 9 Dec 2025) enables intuitive surface edits: decals, local color changes, or material swaps can be performed directly on the UV textures, instantly propagating to all relevant Gaussians.
Relighting with Material Maps: Through continuous UV-space albedo, roughness, and normal maps, avatars can be relit in arbitrary environments with physically correct shading and shadows, as demonstrated in GTAvatar (Baert et al., 9 Dec 2025).
Semantic Region Editing and Transfer: $E^3$ Gen (Zhang et al., 2024) enables semantic edits via local adjustment of UVplane attributes, affecting only the targeted anatomical region (face, hand, body) thanks to shared UV parameterization.

A critical advantage of the UV-guided paradigm is that it imports the mature tooling, semantics, and established workflows of mesh-based graphics into high-fidelity volumetric avatars, enabling both programmatic and artist-guided manipulation.

7. Limitations and Future Directions

Despite significant advances, several practical and methodological limitations persist:

Dependence on Template Registration: Most methods require accurate fitting of the template mesh, with scan-based supervision for optimal results. Errors in UV-mesh registration degrade geometry and texture reproduction (Jiang et al., 2024, Zhang et al., 6 May 2025).
Handling Loose Garments / Topological Changes: Avatars with complex clothing, extreme articulation, or topological changes (unbuttoned shirts, open mouths) challenge UV-based continuity. Current models struggle with fully unsupervised mesh refinement and generalization to non-registered, in-the-wild data (Jiang et al., 2024).
Memory and Performance Scalability: Achieving extreme-resolution fidelity often demands millions of Gaussians (TeGA (Li et al., 8 May 2025)) and careful memory management for real-time rendering at 4K resolutions.
Unexplored Territories: Future research focuses include unsupervised mesh adaptation, integration of richer material reflectance models, and data-efficient universal UV-to-Gaussian mapping across highly variable capture rigs (Jiang et al., 2024, Baert et al., 9 Dec 2025).

UV-guided avatar modeling has established a robust framework underlying the latest photo-realistic, animatable, and editable digital humans, and continues to drive frontiers in telepresence, virtual production, and digital embodiment.