Layered Appearance Coherence Difference (LACD)

Updated 27 January 2026

LACD is a metric that quantifies garment layering fidelity in virtual try-on images by comparing RGB differences in boundary and interior regions.
It decomposes image errors into per-layer measures and up-weights boundary pixels using a factor (λ₁) to capture occlusion and transition effects.
Empirical evaluations show LACD's sensitivity to occlusion modeling improvements and its advantage over traditional global similarity metrics.

The Layered Appearance Coherence Difference (LACD) is a quantitative metric designed for evaluating the coherence of appearance in multi-layer virtual try-on (VTON) images. It addresses the task of assessing how faithfully a generative model renders the visual properties of individual garment layers and, crucially, the transitions—the occlusion relationships—between overlapping garments. LACD was introduced in the context of the GO-MLVTON framework and is specifically constructed to overcome the limitations of previous perceptual and global similarity measures in the context of multi-layer garment synthesis (Yu et al., 20 Jan 2026).

1. Formal Definition and Mathematical Formulation

LACD is defined for images containing $N$ garment layers. For each layer $i$ $(1 \leq i \leq N)$ , let $A_i$ be the set of pixel locations comprising the $i$ -th garment (obtained from segmentation masks). The set $B_i \subset A_i$ defines the “connecting region” — pixels lying at the interface between the $i$ -th and $(i+1)$ -th garment layers, i.e., boundaries or overlap areas. The remaining “interior region” $C_i$ is given by $C_i = A_i \setminus B_i$ . For each pixel $p$ , $x_{gt}^{(i,p)} \in \mathbb{R}^3$ and $x_{gen}^{(i,p)} \in \mathbb{R}^3$ are the RGB color values at location $p$ in the ground-truth and generated images, respectively, of layer $i$ .

The per-layer coherence difference is defined as: $\mathrm{lacd}_i = \lambda_1 \sum_{p \in B_i} \|x_{gt}^{(i,p)} - x_{gen}^{(i,p)}\|_2 + \sum_{p \in C_i} \|x_{gt}^{(i,p)} - x_{gen}^{(i,p)}\|_2$ where $\lambda_1 > 1$ (with $\lambda_1 = 3$ in the original formulation) gives increased weight to errors at the garment boundaries.

The overall Layered Appearance Coherence Difference is averaged across layers: $\mathrm{LACD} = \frac{1}{N} \sum_{i=1}^N \mathrm{lacd}_i.$

2. Key Variables and Implementation Details

Symbol/Term	Meaning	Notes
$N$	Number of garment layers	Typically $N=2$ (inner/outer) in most benchmarks
$A_i$	Set of pixel indices of the $i$ -th garment	From segmentation masks
$B_i$	Boundary/connection pixels to adjacent layer	Pixels near $A_i \cap A_{i+1}$ ; empty for outermost layer
$C_i$	Interior pixels of layer $i$ ( $A_i \setminus B_i$ )	Excludes boundary region
$x_{gt}^{(i,p)},\; x_{gen}^{(i,p)}$	RGB vectors for GT and generated images at $p$ , layer $i$	Measured in $\mathbb{R}^3$ , typically $[0,255]^3$
$\\|\cdot\\|_2$	Euclidean norm in RGB space	Standard 2-norm
$\lambda_1$	Weight for boundary pixel errors	Set to 3

Implementing LACD involves precise segmentation of layer masks (using datasets such as MLG, garment parsing via SCHP, and boundary detection via morphological operations) followed by per-pixel error aggregation over region-disjoint sets $B_i$ and $C_i$ (Yu et al., 20 Jan 2026).

3. Motivation and Rationale

Existing VTON evaluation metrics such as SSIM, LPIPS, FID, and KID offer global or perceptual image similarity assessments but do not distinguish errors arising from specific garment layers or, critically, at transition zones between overlapping layers. This is a significant limitation, as the visual fidelity of multi-layer VTON depends not only on the accurate synthesis of textures and shapes for individual garments but also on the realistic depiction of layer occlusions, overlaps, and the avoidance of artifacts such as “bleeding” or boundary misalignments.

By explicitly decomposing the image error over per-layer interior and boundary regions—with boundary regions up-weighted by $\lambda_1$ —LACD provides targeted penalization for visually salient errors near garment boundaries, reflecting the layered structure of multi-garment try-on. This allows LACD to evaluate not merely perceptual similarity but the structural plausibility of garment layering and occlusion, which standard metrics cannot directly capture (Yu et al., 20 Jan 2026).

4. Practical Computation and Evaluation Protocol

LACD calculation requires:

Segmentation: Extracting ground-truth and predicted garment regions using parsing models (e.g., SCHP for full-body parsing; SAM for garment segmentation) to produce $A_i$ masks. Boundary regions $B_i$ are identified by locating the overlap or adjacency of $A_i$ and $A_{i+1}$ , often using morphological dilation/intersection operations.
Pixel Grouping: Classifying $A_i$ pixels into $B_i$ and $C_i$ through spatial and mask-based computation.
Error Aggregation: Summing $\ell_2$ RGB differences within $B_i$ and $C_i$ per layer, scaling errors within $B_i$ by $\lambda_1$ .
Averaging: Computing the final LACD across all garment layers in the scene.

The metric assumes, in the case of generated samples, that segmentation masks remain valid due to the network’s conditioning on the underlying garment structures.

5. Empirical Effectiveness and Sensitivity

Empirical evaluation on the MLG dataset demonstrates LACD’s discriminative capability. GO-MLVTON achieves an LACD of 0.623 on the test split of 755 samples, outperforming CAT-DM (0.719), MV-VTON (0.973), and closely matching CATVTON (0.626). Lower LACD scores are found to correlate strongly with crisper, more visually plausible layer boundaries and a reduction in occlusion-related artifacts.

Ablation experiments evidence LACD’s sensitivity to improvements in occlusion modeling:

Baseline (no Garment Occlusion Learning [GOL], no explicit occlusion loss): 0.625
With GOL only (no occlusion loss): 0.868 (demonstrating the importance of supervised boundary handling)
With GOL and occlusion loss ( $\mathcal{L}_{OCC}$ ): 0.623 (best)

This indicates that LACD is responsive to targeted architectural or loss-driven enhancements affecting garment boundary realism and occlusion relationships (Yu et al., 20 Jan 2026).

6. Comparison with Existing Image Similarity Metrics

LACD differs from conventional metrics as presented in the following table:

Metric	Pixel Weighting	Layer/Boundary Awareness
SSIM	Uniform	None
LPIPS	Uniform/perceptual	None
FID/KID	Distributional (global)	None
LACD	Weighted (boundary-emphatic)	Per-layer and boundary explicit

Unlike SSIM or LPIPS, which treat all pixels equally and lack garment layering context, LACD explicitly computes and up-weights errors by region, directly measuring boundary and interior coherence within and across layers. Distribution-level metrics (FID, KID) account for overall realism but are agnostic to layered structural relationships.

By averaging per-layer scores, LACD avoids biasing results toward larger garments and enables interpretable, layer-localized error analysis. This tailored approach establishes LACD as the first metric specifically designed to quantify “layered coherence” in multi-layer VTON, with demonstrated correlation to visual quality in overlap and occlusion regions (Yu et al., 20 Jan 2026).

Markdown Report Issue Upgrade to Chat

References (1)

GO-MLVTON: Garment Occlusion-Aware Multi-Layer Virtual Try-On with Diffusion Models (2026)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Layered Appearance Coherence Difference (LACD).