FLAME-based 2D Gaussian Splatting for Head Avatars

Updated 26 January 2026

The technique anchors 2D Gaussian functions to the FLAME mesh, ensuring precise geometric fidelity and seamless deformation with facial expressions.
It employs a differentiable rendering pipeline and a progressive, hybrid 2D–3D training strategy to optimize both appearance and surface adherence.
Quantitative benchmarks demonstrate improved PSNR, SSIM, and reduced point-to-surface errors, validating its efficacy in real-time head avatar synthesis.

FLAME-based 2D Gaussian Splatting is a technique for constructing high-fidelity, geometrically consistent head avatars by parameterizing 2D Gaussian functions directly on the surface mesh defined by the FLAME (Face, Landmark, and Expression) model. This methodology delivers significant advantages in geometric accuracy relative to volumetric approaches, while remaining fully compatible with deformation and animation driven by FLAME’s low-dimensional parameterization space. The approach is foundational in hybrid surface/volumetric methods—such as MixedGaussianAvatar—which combine 2D Gaussian splatting for geometry with 3D splats for enhanced photorealism and appearance details (Chen et al., 2024).

1. Conceptual Foundations

FLAME-based 2D Gaussian Splatting situates itself at the intersection of surface-based avatar modeling and differentiable graphics. The central idea is to attach 2D Gaussian splats to the triangles of a deformable surface mesh derived from the FLAME statistical head model. This ensures that appearance primitives remain locked to a consistent physically plausible geometry under pose, shape, and expression changes parameterized by FLAME. In contrast to @@@@1@@@@ (3DGS), which positions volumetric Gaussians freely in space but may suffer multi-view inconsistency, 2D splatting offers strict geometric fidelity by enforcing per-splat adherence to known surface topology (Chen et al., 2024).

This methodology emerged in response to the limitations of both 3DGS (surface fuzziness, geometric inconsistency) and pure surface-based colorization methods (limited rendering fidelity). It leverages advances in differentiable rasterization and mesh deformation to maintain compatibility with real-time animation and dense view synthesis tasks.

2. Mathematical Formulation of 2D Gaussian Splatting on FLAME Meshes

Each triangle $T$ of the FLAME mesh, specified in rest pose, is endowed with a set of 2D Gaussian splats parameterized in the local UV coordinate chart. A single 2D Gaussian $g$ is defined by its mean $\mu \in \mathbb{R}^2$ , covariance $\Sigma \in \mathbb{R}^{2 \times 2}$ , color vector $\mathbf{c} \in \mathbb{R}^3$ (commonly using spherical harmonics), and opacity $\alpha \in [0,1]$ . The radiance function of a splat at a UV position $u$ is

$f(u) = \alpha\ \exp\left(-\frac{1}{2}(u-\mu)^\mathsf{T} \Sigma^{-1} (u-\mu)\right)\, \mathbf{c}.$

Mesh anchoring is realized as follows. When the FLAME model undergoes shape, pose, or expression deformations, each triangle is mapped from its canonical UV chart to world space via an affine transformation. The 2D Gaussian mean and covariance are correspondingly updated: $\mu_g^{2D} = R(\lambda \mu) + T + p_\theta^{2D}, \quad \Sigma_g^{2D} = R (\lambda^2 \Sigma) R^\mathsf{T},$ where $R$ and $T$ come from FLAME blend skinning, $\lambda$ is a global scale, and $p_\theta^{2D}$ is a small per-splat correction (Chen et al., 2024).

3. Differentiable Rendering Pipeline

Rendering consists of a differentiable rasterization process that, for each image pixel $x$ , computes ray–triangle intersection to recover the corresponding UV position $u(x)$ for each supporting triangle. The per-pixel color from all overlapping 2D Gaussians is composited using alpha-blending:

$T_i = \prod_{j<i} \left[ 1 - \alpha_j G^{2D}_j(x) \right],$

$\mathbf{c}^{2D}(x) = \sum_{i=1}^N T_i\, \alpha_i G^{2D}_i(x)\, \mathbf{c}_i,$

where $G^{2D}_i(x)$ is the spatial falloff for each splat, and $T_i$ denotes the cumulative transmittance up to splat $i$ . Splats are sorted front-to-back by depth, ensuring occlusions and visibility are naturally resolved.

This process is fully differentiable, facilitating gradient-based optimization of all splat parameters with respect to photometric losses between rendered images and multiview ground truth (Chen et al., 2024).

4. Progressive Training and Hybrid 2D–3D Representation

Training proceeds in a two-stage progressive manner. First, only 2D Gaussian splats are optimized for image reconstruction fidelity and surface adherence, via objectives including $L_1$ photometric loss, D-SSIM, geometric regularization, depth distortion, and normal consistency. Typical hyperparameters include a learning rate of $1 \times 10^{-2}$ and approximately 50,000 splats for dense coverage.

Subsequently, in regions where 2DGS fails to achieve adequate rendering fidelity (such as specularities or high-frequency creases), a tilewise MSE analysis is performed to identify problematic areas. For each detected region, a colocated 3D Gaussian is spawned, inheriting local surface parameters with an additional nominal depth scale. This produces a mixed 2D–3D representation. During the mixed fine-tuning stage, the geometric parameters of the 2D splats are held fixed, and the 3D splats are optimized with an additional proximity regularization term. The mixed loss is given by

$L_{\text{mixed}} = L_{\text{rgb}} + \lambda_5 L_{\text{dis}},$

with appropriate hyperparameters to ensure spatial coherence (Chen et al., 2024).

5. Animation and Real-Time Synthesis

FLAME parameter-driven deformation directly controls the placement of 2D splats, ensuring that both surface geometry and attached appearance faithfully track changes in head shape ( $\beta$ ), expression ( $\theta$ ), and global pose $(R, t)$ . At test time, new head configurations are animated by updating all splat transformations according to the FLAME mapping equations, with no additional optimization required. Color consistency is maintained by the spherical harmonics encoding used for the splat colors. A small Jacobian damping term on learnable offsets $p_\theta$ prevents spurious global displacements.

This architecture enables real-time animation and rendering performance exceeding 60 FPS, with reported training times of 30 minutes for the 2D stage and 45 minutes for the mixed stage using an NVIDIA A100 (Chen et al., 2024).

6. Quantitative Results and Benchmark Evaluation

On the NeRSemble benchmark, which comprises 16 camera views per subject, the MixedGaussianAvatar framework achieves a PSNR of 31.8 dB—exceeding the best pure 3DGS baseline by 0.4 dB—and a SSIM of 0.953. Notably, the average point-to-surface geometric error is reduced from 0.82 mm for 3DGS to 0.35 mm for the mixed approach. On the INSTA self-reenactment benchmark, MixedGaussianAvatar attains a PSNR of 30.4 dB and a SSIM of 0.962, outperforming both the 2DGS-only (27.8 dB) and 3DGS-only (29.7 dB) baselines (Chen et al., 2024). These results confirm the effectiveness of 2D splatting on FLAME and hybrid approaches for delivering high-fidelity geometric and appearance outcomes.

7. Significance and Limitations

Anchoring 2D Gaussians to the FLAME mesh guarantees cross-view geometric accuracy and compatibility with low-dimensional identity, expression, and pose controls. The hybrid method restores photorealism in regions where pure 2D splatting is insufficient, achieving both efficiency and high quality in surface and appearance. Limitations include the need for a well-calibrated FLAME mesh, potential challenges in capturing fine volumetric effects off the surface, and increased complexity in regions demanding dense mixed splat allocation. The empirical results demonstrate both real-time viability and performance benefits in virtual avatar and telepresence applications (Chen et al., 2024).

A plausible implication is that these techniques can be generalized to other articulated or deformable surface models and to domains (e.g., full-body avatars) where template meshes and surface-anchored appearance models are available. However, surface adherence remains essential for geometric fidelity, while careful hybridization with volumetric splats is required to capture realistic appearance phenomena that surface models alone cannot represent.

Markdown Report Issue Upgrade to Chat

References (1)

MixedGaussianAvatar: Realistically and Geometrically Accurate Head Avatar via Mixed 2D-3D Gaussian Splatting (2024)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to FLAME-based 2D Gaussian Splatting.

FLAME-based 2D Gaussian Splatting for Head Avatars

1. Conceptual Foundations

2. Mathematical Formulation of 2D Gaussian Splatting on FLAME Meshes

3. Differentiable Rendering Pipeline

4. Progressive Training and Hybrid 2D–3D Representation

5. Animation and Real-Time Synthesis

6. Quantitative Results and Benchmark Evaluation

7. Significance and Limitations

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Don't miss out on important new AI/ML research

FLAME-based 2D Gaussian Splatting for Head Avatars

1. Conceptual Foundations

2. Mathematical Formulation of 2D Gaussian Splatting on FLAME Meshes

3. Differentiable Rendering Pipeline

4. Progressive Training and Hybrid 2D–3D Representation

5. Animation and Real-Time Synthesis

6. Quantitative Results and Benchmark Evaluation

7. Significance and Limitations

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Related Topics

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research