Virtual Label-distribution-aware Learning (VILL)
- Virtual Label-distribution-aware Learning (VILL) is an unsupervised domain adaptation module that enhances category fairness by targeting poor performance in underrepresented classes.
- It employs an adaptive re-weighting mechanism that amplifies the influence of minority classes based on a virtual label distribution derived from pseudo-labels.
- A KL-divergence-based re-balancing strategy on target predictions promotes a uniform class distribution, yielding 3–7 percentage point gains in worst-class accuracies.
Virtual Label-distribution-aware Learning (VILL) is an architectural-agnostic module for Unsupervised Domain Adaptation (UDA) designed to address category fairness, i.e., performance disparities across classes when transferring models between domains. Traditional UDA approaches optimize for global accuracy but frequently overlook the challenge of maintaining high accuracy for “hard” categories, leading to significant variance across per-class performance. VILL augments arbitrary UDA methods with plug-and-play components that emphasize worst-class improvements without sacrificing overall performance by combining adaptive re-weighting and KL-divergence-based re-balancing mechanisms (Zhang et al., 26 Jan 2026).
1. Motivation and Definition: Category Fairness in UDA
Standard UDA training optimizes the joint loss
where is the supervised cross-entropy over the source domain , and enhances domain invariance. Empirical analysis demonstrates that UDA classifiers often exhibit disproportionate accuracy across classes; “easy” classes—those with domain alignment—attain high accuracies, while “hard” classes lag, a phenomenon quantifiable via Worst-N accuracy (mean accuracy of the N lowest-performing target classes). For example, “Worst-5” or “Worst-10” accuracy can fall well below the average, highlighting a lack of fairness in standard UDA deployments.
VILL directly targets this discrepancy: its adaptive architecture elevates the model’s sensitivity to underperforming classes, combining (1) source loss re-weighting to emphasize hard categories and (2) target output re-balancing that nudges predictions toward uniformity. These components are entirely unsupervised with respect to the target domain and require no modification of underlying network architectures.
2. Adaptive Re-weighting via Virtual Label Distribution
At every epoch, pseudo-labels are constructed for target samples over classes. The “virtual label distribution” is then defined:
$v_i = \frac{N_i}{N_t}, \qquad N_i = \sum_{j=1}^{N_t} \mathbbm{1}(\hat y_j^t = i)$
which is scaled as
for stabilization. The adaptive category weights use a smoothed negative-exponential transform:
with typical . Low-frequency classes (minorities) receive amplified weights. The re-weighted source cross-entropy loss for a source minibatch becomes
where is cross-entropy, and is the set of source examples in the batch with label . In summary,
This mechanism ensures categories under-represented in the pseudo-label set receive greater influence during network updates.
3. KL-divergence-based Re-balancing on Target Predictions
Complementing re-weighting, VILL introduces a loss on target predictions promoting distributional uniformity. For a target minibatch , predicted probabilities are computed as , and their batch average as
The KL divergence from to the desired is calculated:
with the first term penalizing excessive confidence in majority classes and the second (entropy) countering output collapse. The induced loss is
which acts only on target batches, without requiring any target labels.
4. Combined Objective and Implementation Details
VILL generalizes arbitrary base UDA methods, substituting by and supplementing KL re-balancing:
with inherited from the base method and (default –$0.1$) regulating the trade-off.
The training loop consists of:
- Forward pass on target samples, generating pseudo-labels.
- Re-computation of and per class.
- Minibatch processing:
- Source: Compute ;
- Domain alignment: Compute (on both domains);
- Target: Compute ;
- Update parameters via gradients of .
- End-of-epoch target relabeling for refreshed .
A pseudocode realization follows:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 |
initialize G, F initialize ω_i = 1/C for all i for epoch = 1…max_epochs: # 1) update pseudo-labels and ω for each x in D_t: p = softmax(F(G(x))) pseudo-label ŷ = argmax p compute counts N_i and E_i = (N_i/N_t)*C ω_i ← (1+α e^{−E_i}) / Σ_{k}(1+α e^{−E_k}) # 2) train with VILL for each minibatch (X^s,Y^s), X^t: L_RW = reweighted_CE(X^s,Y^s;ω) L_DA = domain_alignment_loss(X^s,X^t) p_t = softmax(F(G(X^t))) L_RB = D_KL(mean(p_t) || ω) L = L_RW + λ_DA*L_DA + λ_KL*L_RB backpropagate L, update G,F |
5. Theoretical Foundations and Hyperparameter Choices
The adaptive weight vector approximates the inverse class frequency derived from noisy target pseudo-labels, connecting VILL to established long-tailed learning fairness strategies. The KL-divergence component structurally minimizes an -divergence between the network’s average predictive distribution and a balanced target, compelling decision boundaries to expand into under-represented classes. The procedure omits formal convergence proofs but guarantees stability under common stochastic gradient updates: all terms are continuous, differentiable, and bounded.
Typical hyperparameter ranges are:
- (disparity strength): 1.0–10.0, default 5.0 (balances fairness vs. stability).
- (re-balancing weight): 0.01–0.1, default 0.05 (controls fairness impact).
- : set as in original UDA baseline. Warm-up strategies exist, where is activated after pseudo-label maturation (usually 1–2 epochs).
6. Empirical Performance and Ablation Studies
Experiments on OfficeHome (12 classes, 4 domains) and Office-31 (31 classes, 3 domains) confirm VILL’s effectiveness:
- Integration into CDAN yields a Worst-5 accuracy increase from 20.3% to 26.8% (OfficeHome), with mean accuracy stable around 68%.
- Across baselines (MDD, ATDOC, CLIP-based PDA), VILL raises Worst-5/10 accuracy by 3–7 pts, sometimes improving overall accuracy by ≈0.5 pts.
- On Office-31, Worst-3 accuracy gains 4–6 pts.
Ablation (OfficeHome, CDAN backbone):
| Methodology | Worst-5 | Worst-10 | Avg |
|---|---|---|---|
| Baseline | 20.3 | 28.7 | 68.0 |
| +Re-weighting | 22.4 | 31.2 | 68.7 |
| +Re-balancing | 26.0 | 33.7 | 68.0 |
| VILL (full) | 26.8 | 34.5 | 67.9 |
The KL re-balancing confers the greatest single boost, but maximal improvement comes from their combination, indicating both mechanisms contribute orthogonally to fairness enhancement.
7. Significance and Integration
VILL stands out for its ease of integration—no architectural changes, minimal computational overhead, and applicability to any UDA baseline. The explicit focus on category fairness distinguishes it from prior UDA protocols prioritizing only mean accuracy. A plausible implication is that future domain adaptation research might standardize worst-class metrics to report the effectiveness of fairness strategies such as VILL (Zhang et al., 26 Jan 2026).