Papers
Topics
Authors
Recent
Search
2000 character limit reached

Layered Appearance Coherence Difference (LACD)

Updated 27 January 2026
  • LACD is a metric that quantifies garment layering fidelity in virtual try-on images by comparing RGB differences in boundary and interior regions.
  • It decomposes image errors into per-layer measures and up-weights boundary pixels using a factor (λ₁) to capture occlusion and transition effects.
  • Empirical evaluations show LACD's sensitivity to occlusion modeling improvements and its advantage over traditional global similarity metrics.

The Layered Appearance Coherence Difference (LACD) is a quantitative metric designed for evaluating the coherence of appearance in multi-layer virtual try-on (VTON) images. It addresses the task of assessing how faithfully a generative model renders the visual properties of individual garment layers and, crucially, the transitions—the occlusion relationships—between overlapping garments. LACD was introduced in the context of the GO-MLVTON framework and is specifically constructed to overcome the limitations of previous perceptual and global similarity measures in the context of multi-layer garment synthesis (Yu et al., 20 Jan 2026).

1. Formal Definition and Mathematical Formulation

LACD is defined for images containing NN garment layers. For each layer ii (1iN)(1 \leq i \leq N), let AiA_i be the set of pixel locations comprising the ii-th garment (obtained from segmentation masks). The set BiAiB_i \subset A_i defines the “connecting region” — pixels lying at the interface between the ii-th and (i+1)(i+1)-th garment layers, i.e., boundaries or overlap areas. The remaining “interior region” CiC_i is given by Ci=AiBiC_i = A_i \setminus B_i. For each pixel pp, xgt(i,p)R3x_{gt}^{(i,p)} \in \mathbb{R}^3 and xgen(i,p)R3x_{gen}^{(i,p)} \in \mathbb{R}^3 are the RGB color values at location pp in the ground-truth and generated images, respectively, of layer ii.

The per-layer coherence difference is defined as: lacdi=λ1pBixgt(i,p)xgen(i,p)2+pCixgt(i,p)xgen(i,p)2\mathrm{lacd}_i = \lambda_1 \sum_{p \in B_i} \|x_{gt}^{(i,p)} - x_{gen}^{(i,p)}\|_2 + \sum_{p \in C_i} \|x_{gt}^{(i,p)} - x_{gen}^{(i,p)}\|_2 where λ1>1\lambda_1 > 1 (with λ1=3\lambda_1 = 3 in the original formulation) gives increased weight to errors at the garment boundaries.

The overall Layered Appearance Coherence Difference is averaged across layers: LACD=1Ni=1Nlacdi.\mathrm{LACD} = \frac{1}{N} \sum_{i=1}^N \mathrm{lacd}_i.

2. Key Variables and Implementation Details

Symbol/Term Meaning Notes
NN Number of garment layers Typically N=2N=2 (inner/outer) in most benchmarks
AiA_i Set of pixel indices of the ii-th garment From segmentation masks
BiB_i Boundary/connection pixels to adjacent layer Pixels near AiAi+1A_i \cap A_{i+1}; empty for outermost layer
CiC_i Interior pixels of layer ii (AiBiA_i \setminus B_i) Excludes boundary region
xgt(i,p),  xgen(i,p)x_{gt}^{(i,p)},\; x_{gen}^{(i,p)} RGB vectors for GT and generated images at pp, layer ii Measured in R3\mathbb{R}^3, typically [0,255]3[0,255]^3
2\|\cdot\|_2 Euclidean norm in RGB space Standard 2-norm
λ1\lambda_1 Weight for boundary pixel errors Set to 3

Implementing LACD involves precise segmentation of layer masks (using datasets such as MLG, garment parsing via SCHP, and boundary detection via morphological operations) followed by per-pixel error aggregation over region-disjoint sets BiB_i and CiC_i (Yu et al., 20 Jan 2026).

3. Motivation and Rationale

Existing VTON evaluation metrics such as SSIM, LPIPS, FID, and KID offer global or perceptual image similarity assessments but do not distinguish errors arising from specific garment layers or, critically, at transition zones between overlapping layers. This is a significant limitation, as the visual fidelity of multi-layer VTON depends not only on the accurate synthesis of textures and shapes for individual garments but also on the realistic depiction of layer occlusions, overlaps, and the avoidance of artifacts such as “bleeding” or boundary misalignments.

By explicitly decomposing the image error over per-layer interior and boundary regions—with boundary regions up-weighted by λ1\lambda_1—LACD provides targeted penalization for visually salient errors near garment boundaries, reflecting the layered structure of multi-garment try-on. This allows LACD to evaluate not merely perceptual similarity but the structural plausibility of garment layering and occlusion, which standard metrics cannot directly capture (Yu et al., 20 Jan 2026).

4. Practical Computation and Evaluation Protocol

LACD calculation requires:

  • Segmentation: Extracting ground-truth and predicted garment regions using parsing models (e.g., SCHP for full-body parsing; SAM for garment segmentation) to produce AiA_i masks. Boundary regions BiB_i are identified by locating the overlap or adjacency of AiA_i and Ai+1A_{i+1}, often using morphological dilation/intersection operations.
  • Pixel Grouping: Classifying AiA_i pixels into BiB_i and CiC_i through spatial and mask-based computation.
  • Error Aggregation: Summing 2\ell_2 RGB differences within BiB_i and CiC_i per layer, scaling errors within BiB_i by λ1\lambda_1.
  • Averaging: Computing the final LACD across all garment layers in the scene.

The metric assumes, in the case of generated samples, that segmentation masks remain valid due to the network’s conditioning on the underlying garment structures.

5. Empirical Effectiveness and Sensitivity

Empirical evaluation on the MLG dataset demonstrates LACD’s discriminative capability. GO-MLVTON achieves an LACD of 0.623 on the test split of 755 samples, outperforming CAT-DM (0.719), MV-VTON (0.973), and closely matching CATVTON (0.626). Lower LACD scores are found to correlate strongly with crisper, more visually plausible layer boundaries and a reduction in occlusion-related artifacts.

Ablation experiments evidence LACD’s sensitivity to improvements in occlusion modeling:

  • Baseline (no Garment Occlusion Learning [GOL], no explicit occlusion loss): 0.625
  • With GOL only (no occlusion loss): 0.868 (demonstrating the importance of supervised boundary handling)
  • With GOL and occlusion loss (LOCC\mathcal{L}_{OCC}): 0.623 (best)

This indicates that LACD is responsive to targeted architectural or loss-driven enhancements affecting garment boundary realism and occlusion relationships (Yu et al., 20 Jan 2026).

6. Comparison with Existing Image Similarity Metrics

LACD differs from conventional metrics as presented in the following table:

Metric Pixel Weighting Layer/Boundary Awareness
SSIM Uniform None
LPIPS Uniform/perceptual None
FID/KID Distributional (global) None
LACD Weighted (boundary-emphatic) Per-layer and boundary explicit

Unlike SSIM or LPIPS, which treat all pixels equally and lack garment layering context, LACD explicitly computes and up-weights errors by region, directly measuring boundary and interior coherence within and across layers. Distribution-level metrics (FID, KID) account for overall realism but are agnostic to layered structural relationships.

By averaging per-layer scores, LACD avoids biasing results toward larger garments and enables interpretable, layer-localized error analysis. This tailored approach establishes LACD as the first metric specifically designed to quantify “layered coherence” in multi-layer VTON, with demonstrated correlation to visual quality in overlap and occlusion regions (Yu et al., 20 Jan 2026).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Layered Appearance Coherence Difference (LACD).