Papers
Topics
Authors
Recent
Search
2000 character limit reached

FLAME 3D Morphable Model (3DMM)

Updated 28 January 2026
  • FLAME 3DMM is a parametric model that represents the full human head using distinct shape, expression, and pose parameters.
  • It employs PCA-based bases and a learned skinning operator to generate anatomically plausible, poseable meshes for robust facial reconstruction.
  • FLAME enables advanced applications such as neural volumetric rendering and expression inference, achieving state-of-the-art performance on various benchmarks.

The FLAME (Faces Learned with an Articulated Model and Expressions) 3D Morphable Model (3DMM) is a parametric model specifically designed to represent the full human head, including facial identity, expressions, and articulated pose, in a way that is both compact and highly disentangled. FLAME’s architecture allows for robust statistical modeling of facial geometry and animation-ready mesh deformations, supporting applications including facial reconstruction from images, neural volumetric rendering, animation, and facial expression inference.

1. Mathematical Formulation and Parameterization

FLAME models the human head mesh via a low-dimensional parameter space, enabling the generation of anatomically plausible, poseable 3D head shapes. The formulation consists of three main sets of parameters:

  • Shape coefficients αRns\alpha\in\mathbb{R}^{n_s} (typically ns=100n_s=100), encoding subject-specific identity variation.
  • Expression coefficients δRne\delta\in\mathbb{R}^{n_e} (typically ne=50n_e=50), parameterizing facial expressions.
  • Pose parameters θRnp\theta\in\mathbb{R}^{n_p} (e.g., np=6n_p=6, covering global rotation and jaw articulation).

The mean template mesh Sˉ\bar S is combined with principal component analysis (PCA) bases for shape BsB_s and expression BeB_e, and a pose basis BpB_p for corrective deformations: ns=100n_s=1000 The unposed mesh is

ns=100n_s=1001

A learned linear blend skinning operator ns=100n_s=1002 applies pose-dependent articulation over joints ns=100n_s=1003. The posed mesh is then: ns=100n_s=1004 Optionally, a global scale ns=100n_s=1005 and translation ns=100n_s=1006 are applied,

ns=100n_s=1007

This structure ensures disentangled, interpretable, and differentiable mapping between parameter vector ns=100n_s=1008 and mesh vertices ns=100n_s=1009, with δRne\delta\in\mathbb{R}^{n_e}0 typically 10,000–20,000.

2. FLAME-Based Reconstruction Pipelines

Modern 3D face reconstruction pipelines leverage FLAME as the target parameter space for fitting 3D head geometry and appearance from monocular or multi-view images. KaoLRM (Zhu et al., 19 Jan 2026) exemplifies this by projecting features from a pretrained Large Reconstruction Model (LRM) into FLAME parameters through a gating and regression scheme. Specifically:

  • LRM produces triplane features δRne\delta\in\mathbb{R}^{n_e}1, which are flattened into tokens δRne\delta\in\mathbb{R}^{n_e}2.
  • A self-gating MLP predicts per-token gates δRne\delta\in\mathbb{R}^{n_e}3, yielding gated tokens δRne\delta\in\mathbb{R}^{n_e}4.
  • A regressor δRne\delta\in\mathbb{R}^{n_e}5 outputs predicted FLAME parameters: δRne\delta\in\mathbb{R}^{n_e}6.

Supervision can include a landmark loss: δRne\delta\in\mathbb{R}^{n_e}7 as well as δRne\delta\in\mathbb{R}^{n_e}8 regularizers on shape and expression: δRne\delta\in\mathbb{R}^{n_e}9

For appearance modeling, KaoLRM applies FLAME-based 2D Gaussian splatting. Points ne=50n_e=500 are densely sampled on the reconstructed mesh; each is transformed into a 2D Gaussian primitive in image space, and appearance is rendered by weighted splatting. Rendering and binding losses (relying on depth and normals from both mesh and Gaussian splats) enable end-to-end optimization.

Multi-stage training is common: landmark and regularization losses enable coarse alignment, then photometric and geometric binding losses refine fine structure and appearance. KaoLRM demonstrates this yields state-of-the-art reconstruction accuracy and cross-view consistency on benchmarks such as FaceVerse and NoW (Zhu et al., 19 Jan 2026).

3. Integration with Neural Volumetric Rendering

Several frameworks combine FLAME's explicit mesh structure with implicit neural representations like NeRF to obtain both photorealistic rendering and full expression/pose control.

NeRFlame (Zając et al., 2023) and FLAME-in-NeRF (Athar et al., 2021) both incorporate FLAME in radiance field pipelines:

  • The FLAME mesh generates a dense 3D surface; a distance field ne=50n_e=501 is defined to be nonzero only near the mesh surface: ne=50n_e=502 with ne=50n_e=503 the minimum distance from ne=50n_e=504 to the mesh.
  • For color, a NeRF-MLP ne=50n_e=505 receives positional encodings and predicts RGB: ne=50n_e=506.
  • Control over expression and pose is achieved by manipulating FLAME parameters, which induce deformations of both the mesh and the NeRF density support.

FLAME-in-NeRF further conditions the NeRF MLP on the FLAME expression code, concatenated to input layers, and uses a spatial prior (occupancy mask) to enforce that only facial regions respond to expression changes. Training employs combined losses: photometric, regularization on parameters, and novel disentanglement and spatial priors (Athar et al., 2021).

Joint optimization over neural rendering weights and FLAME parameters, often in multiple training phases, produces models with high-fidelity reconstructions that are directly controllable via FLAME's low-dimensional latent space (Zając et al., 2023).

4. Data-Driven FLAME Fitting from Images

FLAME parameter extraction is commonly performed via deep regression models trained to predict FLAME codes from monocular imagery under a battery of self-supervised and supervised objectives.

Anisetty et al. (Anisetty et al., 2022) develop an unsupervised encoder for in-the-wild images that outputs FLAME coefficients regulating both facial and full-head shape, even under severe hair occlusion. Core components include:

  • Dice consistency loss aligning the silhouette of rendered mesh (post-hair-inpainting) to observed skin.
  • Scale consistency loss ensuring shape invariance across varying crop levels (tight/loose framing).
  • Landmark detection for extended 71-point topology to constrain upper-head reconstruction.
  • Encoder consistency and regularization to stabilize predicted parameters.

The system yields competitive performance on face (NoW) and full-head (CoMA, LYHM) evaluation datasets, confirming FLAME's utility for unsupervised, accurate geometry recovery from unconstrained images (Anisetty et al., 2022).

5. Applications in Facial Expression Inference and Recognition

FLAME-derived representations encode rich information on both facial identity and expression, and recent work has incorporated these 3D parameters as feature spaces for facial expression inference (FEI) tasks.

Ig3D (Dong et al., 2024) conducts a systematic study, evaluating both “short” (just ne=50n_e=507) and “full” (all regressed FLAME parameters) embeddings extracted via EMOCA or SMIRK regressors. Two fusion strategies are analyzed:

  • Intermediate fusion: 3DMM parameters are projected and concatenated with late-stage 2D CNN features, then passed joint through final MLPs.
  • Late fusion: 2D and 3D-based classifiers/regressors produce predictions which are fused at the score level (max, mean, weighted).

On AffectNet and RAF-DB, fusing FLAME embeddings—especially via late fusion—delivers consistent improvements over 2D-only baselines for both discrete expression classification (e.g., ne=50n_e=508 accuracy on RAF-DB) and valence-arousal regression (e.g., ne=50n_e=509 MSE on AffectNet VA), thereby validating the complementary power of 3DMM-based features (Dong et al., 2024).

6. Supervision, Losses, and Optimization Strategies

FLAME-based models are typically supervised using a mix of geometric, photometric, and perceptual losses, as well as statistical priors on latent codes:

  • Landmark alignment (commonly, θRnp\theta\in\mathbb{R}^{n_p}0): improves geometric consistency across views.
  • Photometric and perceptual losses (e.g., pixelwise, VGG feature, D-SSIM).
  • Regularization (e.g., θRnp\theta\in\mathbb{R}^{n_p}1 losses on θRnp\theta\in\mathbb{R}^{n_p}2, θRnp\theta\in\mathbb{R}^{n_p}3, θRnp\theta\in\mathbb{R}^{n_p}4) to prevent parameter drift.
  • Specialized terms such as dice loss for head silhouette, scale consistency for invariance to image crop, encoder consistency, and geometric binding in renderer-fitted pipelines.

Optimization can proceed in multi-stage fashion: coarse alignment is established using geometric cues, followed by photometric/appearance refinement. In end-to-end neural pipelines (e.g., NeRFlame), mesh and NeRF weights are co-optimized, sometimes with staged schedules to balance mesh rigidity and appearance flexibility (Zając et al., 2023, Zhu et al., 19 Jan 2026, Anisetty et al., 2022, Dong et al., 2024).

7. Quantitative Performance and Impact

FLAME-based 3DMM pipelines consistently deliver state-of-the-art results across multiple 3D face benchmarks:

  • KaoLRM: Chamfer mean on FaceVerse θRnp\theta\in\mathbb{R}^{n_p}5 (cf. DECA θRnp\theta\in\mathbb{R}^{n_p}6, EMOCA θRnp\theta\in\mathbb{R}^{n_p}7, SMIRK θRnp\theta\in\mathbb{R}^{n_p}8); NoW challenge split mean θRnp\theta\in\mathbb{R}^{n_p}9 mm (cf. DECA np=6n_p=60 mm) (Zhu et al., 19 Jan 2026).
  • Occlusion-robustness: Dice and scale invariant losses yield accurate full-head reconstructions even with occluding hair (Anisetty et al., 2022).
  • Expression transfer and recognition: FLAME-based features, when fused with 2D CNN or transformer pipelines, provide significant accuracy and robustness boosts for emotion recognition and valence-arousal estimation (Dong et al., 2024).

These results underline the model’s adaptability for controlled facial synthesis, neural rendering, and affective computing.


Relevant References:

KaoLRM (Zhu et al., 19 Jan 2026), NeRFlame (Zając et al., 2023), FLAME-in-NeRF (Athar et al., 2021), Full-head regulation (Anisetty et al., 2022), Ig3D (FEI fusion) (Dong et al., 2024).

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to FLAME 3D Morphable Model (3DMM).