Layered Appearance Coherence Difference (LACD)
- LACD is a metric that quantifies garment layering fidelity in virtual try-on images by comparing RGB differences in boundary and interior regions.
- It decomposes image errors into per-layer measures and up-weights boundary pixels using a factor (λ₁) to capture occlusion and transition effects.
- Empirical evaluations show LACD's sensitivity to occlusion modeling improvements and its advantage over traditional global similarity metrics.
The Layered Appearance Coherence Difference (LACD) is a quantitative metric designed for evaluating the coherence of appearance in multi-layer virtual try-on (VTON) images. It addresses the task of assessing how faithfully a generative model renders the visual properties of individual garment layers and, crucially, the transitions—the occlusion relationships—between overlapping garments. LACD was introduced in the context of the GO-MLVTON framework and is specifically constructed to overcome the limitations of previous perceptual and global similarity measures in the context of multi-layer garment synthesis (Yu et al., 20 Jan 2026).
1. Formal Definition and Mathematical Formulation
LACD is defined for images containing garment layers. For each layer , let be the set of pixel locations comprising the -th garment (obtained from segmentation masks). The set defines the “connecting region” — pixels lying at the interface between the -th and -th garment layers, i.e., boundaries or overlap areas. The remaining “interior region” is given by . For each pixel , and are the RGB color values at location in the ground-truth and generated images, respectively, of layer .
The per-layer coherence difference is defined as: where (with in the original formulation) gives increased weight to errors at the garment boundaries.
The overall Layered Appearance Coherence Difference is averaged across layers:
2. Key Variables and Implementation Details
| Symbol/Term | Meaning | Notes |
|---|---|---|
| Number of garment layers | Typically (inner/outer) in most benchmarks | |
| Set of pixel indices of the -th garment | From segmentation masks | |
| Boundary/connection pixels to adjacent layer | Pixels near ; empty for outermost layer | |
| Interior pixels of layer () | Excludes boundary region | |
| RGB vectors for GT and generated images at , layer | Measured in , typically | |
| Euclidean norm in RGB space | Standard 2-norm | |
| Weight for boundary pixel errors | Set to 3 |
Implementing LACD involves precise segmentation of layer masks (using datasets such as MLG, garment parsing via SCHP, and boundary detection via morphological operations) followed by per-pixel error aggregation over region-disjoint sets and (Yu et al., 20 Jan 2026).
3. Motivation and Rationale
Existing VTON evaluation metrics such as SSIM, LPIPS, FID, and KID offer global or perceptual image similarity assessments but do not distinguish errors arising from specific garment layers or, critically, at transition zones between overlapping layers. This is a significant limitation, as the visual fidelity of multi-layer VTON depends not only on the accurate synthesis of textures and shapes for individual garments but also on the realistic depiction of layer occlusions, overlaps, and the avoidance of artifacts such as “bleeding” or boundary misalignments.
By explicitly decomposing the image error over per-layer interior and boundary regions—with boundary regions up-weighted by —LACD provides targeted penalization for visually salient errors near garment boundaries, reflecting the layered structure of multi-garment try-on. This allows LACD to evaluate not merely perceptual similarity but the structural plausibility of garment layering and occlusion, which standard metrics cannot directly capture (Yu et al., 20 Jan 2026).
4. Practical Computation and Evaluation Protocol
LACD calculation requires:
- Segmentation: Extracting ground-truth and predicted garment regions using parsing models (e.g., SCHP for full-body parsing; SAM for garment segmentation) to produce masks. Boundary regions are identified by locating the overlap or adjacency of and , often using morphological dilation/intersection operations.
- Pixel Grouping: Classifying pixels into and through spatial and mask-based computation.
- Error Aggregation: Summing RGB differences within and per layer, scaling errors within by .
- Averaging: Computing the final LACD across all garment layers in the scene.
The metric assumes, in the case of generated samples, that segmentation masks remain valid due to the network’s conditioning on the underlying garment structures.
5. Empirical Effectiveness and Sensitivity
Empirical evaluation on the MLG dataset demonstrates LACD’s discriminative capability. GO-MLVTON achieves an LACD of 0.623 on the test split of 755 samples, outperforming CAT-DM (0.719), MV-VTON (0.973), and closely matching CATVTON (0.626). Lower LACD scores are found to correlate strongly with crisper, more visually plausible layer boundaries and a reduction in occlusion-related artifacts.
Ablation experiments evidence LACD’s sensitivity to improvements in occlusion modeling:
- Baseline (no Garment Occlusion Learning [GOL], no explicit occlusion loss): 0.625
- With GOL only (no occlusion loss): 0.868 (demonstrating the importance of supervised boundary handling)
- With GOL and occlusion loss (): 0.623 (best)
This indicates that LACD is responsive to targeted architectural or loss-driven enhancements affecting garment boundary realism and occlusion relationships (Yu et al., 20 Jan 2026).
6. Comparison with Existing Image Similarity Metrics
LACD differs from conventional metrics as presented in the following table:
| Metric | Pixel Weighting | Layer/Boundary Awareness |
|---|---|---|
| SSIM | Uniform | None |
| LPIPS | Uniform/perceptual | None |
| FID/KID | Distributional (global) | None |
| LACD | Weighted (boundary-emphatic) | Per-layer and boundary explicit |
Unlike SSIM or LPIPS, which treat all pixels equally and lack garment layering context, LACD explicitly computes and up-weights errors by region, directly measuring boundary and interior coherence within and across layers. Distribution-level metrics (FID, KID) account for overall realism but are agnostic to layered structural relationships.
By averaging per-layer scores, LACD avoids biasing results toward larger garments and enables interpretable, layer-localized error analysis. This tailored approach establishes LACD as the first metric specifically designed to quantify “layered coherence” in multi-layer VTON, with demonstrated correlation to visual quality in overlap and occlusion regions (Yu et al., 20 Jan 2026).