FLAME-based 2D Gaussian Splatting for Head Avatars
- The technique anchors 2D Gaussian functions to the FLAME mesh, ensuring precise geometric fidelity and seamless deformation with facial expressions.
- It employs a differentiable rendering pipeline and a progressive, hybrid 2D–3D training strategy to optimize both appearance and surface adherence.
- Quantitative benchmarks demonstrate improved PSNR, SSIM, and reduced point-to-surface errors, validating its efficacy in real-time head avatar synthesis.
FLAME-based 2D Gaussian Splatting is a technique for constructing high-fidelity, geometrically consistent head avatars by parameterizing 2D Gaussian functions directly on the surface mesh defined by the FLAME (Face, Landmark, and Expression) model. This methodology delivers significant advantages in geometric accuracy relative to volumetric approaches, while remaining fully compatible with deformation and animation driven by FLAME’s low-dimensional parameterization space. The approach is foundational in hybrid surface/volumetric methods—such as MixedGaussianAvatar—which combine 2D Gaussian splatting for geometry with 3D splats for enhanced photorealism and appearance details (Chen et al., 2024).
1. Conceptual Foundations
FLAME-based 2D Gaussian Splatting situates itself at the intersection of surface-based avatar modeling and differentiable graphics. The central idea is to attach 2D Gaussian splats to the triangles of a deformable surface mesh derived from the FLAME statistical head model. This ensures that appearance primitives remain locked to a consistent physically plausible geometry under pose, shape, and expression changes parameterized by FLAME. In contrast to @@@@1@@@@ (3DGS), which positions volumetric Gaussians freely in space but may suffer multi-view inconsistency, 2D splatting offers strict geometric fidelity by enforcing per-splat adherence to known surface topology (Chen et al., 2024).
This methodology emerged in response to the limitations of both 3DGS (surface fuzziness, geometric inconsistency) and pure surface-based colorization methods (limited rendering fidelity). It leverages advances in differentiable rasterization and mesh deformation to maintain compatibility with real-time animation and dense view synthesis tasks.
2. Mathematical Formulation of 2D Gaussian Splatting on FLAME Meshes
Each triangle of the FLAME mesh, specified in rest pose, is endowed with a set of 2D Gaussian splats parameterized in the local UV coordinate chart. A single 2D Gaussian is defined by its mean , covariance , color vector (commonly using spherical harmonics), and opacity . The radiance function of a splat at a UV position is
Mesh anchoring is realized as follows. When the FLAME model undergoes shape, pose, or expression deformations, each triangle is mapped from its canonical UV chart to world space via an affine transformation. The 2D Gaussian mean and covariance are correspondingly updated: where and come from FLAME blend skinning, is a global scale, and is a small per-splat correction (Chen et al., 2024).
3. Differentiable Rendering Pipeline
Rendering consists of a differentiable rasterization process that, for each image pixel , computes ray–triangle intersection to recover the corresponding UV position for each supporting triangle. The per-pixel color from all overlapping 2D Gaussians is composited using alpha-blending:
where is the spatial falloff for each splat, and denotes the cumulative transmittance up to splat . Splats are sorted front-to-back by depth, ensuring occlusions and visibility are naturally resolved.
This process is fully differentiable, facilitating gradient-based optimization of all splat parameters with respect to photometric losses between rendered images and multiview ground truth (Chen et al., 2024).
4. Progressive Training and Hybrid 2D–3D Representation
Training proceeds in a two-stage progressive manner. First, only 2D Gaussian splats are optimized for image reconstruction fidelity and surface adherence, via objectives including photometric loss, D-SSIM, geometric regularization, depth distortion, and normal consistency. Typical hyperparameters include a learning rate of and approximately 50,000 splats for dense coverage.
Subsequently, in regions where 2DGS fails to achieve adequate rendering fidelity (such as specularities or high-frequency creases), a tilewise MSE analysis is performed to identify problematic areas. For each detected region, a colocated 3D Gaussian is spawned, inheriting local surface parameters with an additional nominal depth scale. This produces a mixed 2D–3D representation. During the mixed fine-tuning stage, the geometric parameters of the 2D splats are held fixed, and the 3D splats are optimized with an additional proximity regularization term. The mixed loss is given by
with appropriate hyperparameters to ensure spatial coherence (Chen et al., 2024).
5. Animation and Real-Time Synthesis
FLAME parameter-driven deformation directly controls the placement of 2D splats, ensuring that both surface geometry and attached appearance faithfully track changes in head shape (), expression (), and global pose . At test time, new head configurations are animated by updating all splat transformations according to the FLAME mapping equations, with no additional optimization required. Color consistency is maintained by the spherical harmonics encoding used for the splat colors. A small Jacobian damping term on learnable offsets prevents spurious global displacements.
This architecture enables real-time animation and rendering performance exceeding 60 FPS, with reported training times of 30 minutes for the 2D stage and 45 minutes for the mixed stage using an NVIDIA A100 (Chen et al., 2024).
6. Quantitative Results and Benchmark Evaluation
On the NeRSemble benchmark, which comprises 16 camera views per subject, the MixedGaussianAvatar framework achieves a PSNR of 31.8 dB—exceeding the best pure 3DGS baseline by 0.4 dB—and a SSIM of 0.953. Notably, the average point-to-surface geometric error is reduced from 0.82 mm for 3DGS to 0.35 mm for the mixed approach. On the INSTA self-reenactment benchmark, MixedGaussianAvatar attains a PSNR of 30.4 dB and a SSIM of 0.962, outperforming both the 2DGS-only (27.8 dB) and 3DGS-only (29.7 dB) baselines (Chen et al., 2024). These results confirm the effectiveness of 2D splatting on FLAME and hybrid approaches for delivering high-fidelity geometric and appearance outcomes.
7. Significance and Limitations
Anchoring 2D Gaussians to the FLAME mesh guarantees cross-view geometric accuracy and compatibility with low-dimensional identity, expression, and pose controls. The hybrid method restores photorealism in regions where pure 2D splatting is insufficient, achieving both efficiency and high quality in surface and appearance. Limitations include the need for a well-calibrated FLAME mesh, potential challenges in capturing fine volumetric effects off the surface, and increased complexity in regions demanding dense mixed splat allocation. The empirical results demonstrate both real-time viability and performance benefits in virtual avatar and telepresence applications (Chen et al., 2024).
A plausible implication is that these techniques can be generalized to other articulated or deformable surface models and to domains (e.g., full-body avatars) where template meshes and surface-anchored appearance models are available. However, surface adherence remains essential for geometric fidelity, while careful hybridization with volumetric splats is required to capture realistic appearance phenomena that surface models alone cannot represent.