Papers
Topics
Authors
Recent
Search
2000 character limit reached

UI-Styler: Ultrasound Image Style Transfer

Updated 28 November 2025
  • UI-Styler is a specialized framework for ultrasound image style transfer that employs dual-level stylization and class-aware prompting to address cross-device diagnostic variances.
  • It integrates a three-stage pipeline using Vision Transformer encoders, pattern-matching modules, and prompt-based supervision to ensure semantic consistency.
  • Experimental results demonstrate improved classification and segmentation performance, outperforming conventional unpaired image translation methods.

UI-Styler is a specialized framework for ultrasound image style transfer designed to address cross-device domain shifts in medical imaging diagnostics, with a distinct focus on class-aware semantic alignment under a black-box inference network constraint (Do-Tran et al., 21 Nov 2025). Its architecture introduces novel dual-level stylization and prompt-based supervision, delivering substantive improvements over prior unpaired image translation (UIT) approaches in both distributional alignment and downstream diagnostic accuracy.

1. Problem Setting and Motivation

Diagnostic ultrasound images exhibit substantial inter-device appearance variation due to hardware differences, acquisition protocols, and preprocessing. Such domain shifts frequently degrade the performance of machine learning models, particularly when these models are treated as frozen black-boxes—i.e., their parameters and outputs are inaccessible other than final predictions. Standard UIT techniques seeking to match appearance distributions across domains do not account for semantic consistency with downstream tasks, often leading to misaligned class-content mappings. UI-Styler directly addresses this by enforcing class-specific semantic alignment in the image translation process through the use of class-aware prompts extracted from pseudo labels produced by the black-box inference network (Do-Tran et al., 21 Nov 2025).

2. Core Architecture and Pipeline

The UI-Styler framework is structured as a three-stage pipeline:

  • Feature Extraction: Parallel Vision Transformer (ViT) encoders EsE_s and EtE_t map source and target images xs,xtRH×W×3x_s, x_t \in\mathbb{R}^{H \times W \times 3} to patch embeddings Fs,FtRL×dF_s, F_t \in\mathbb{R}^{L \times d}, with L=(H/P)(W/P)L = (H/P) \cdot (W/P) and embedding dimension dd.
  • Dual-Level Stylization:
    • Pattern-Matching (PM) Module: Applies cross-attention for domain-level style injection, aligning patchwise source and target statistics. Each attention head computes a query-key-value transformation between FsF_s and FtF_t, outputting stylized embeddings F~st\widetilde{F}_{s\to t}.
    • Class-Aware Prompting (CP) Module: Introduces a prompt bank PRC×L×dP \in \mathbb{R}^{C \times L \times d}, where each prototype PcP_c encodes the visual characteristics associated with category cc. Features F~st\widetilde{F}_{s\to t} are shifted toward class-aligned prototypes via inner-product scoring and prompt addition, resulting in class-semantic stylized features F~st+\widetilde{F}_{s\to t}^+.
  • Image Reconstruction: A lightweight decoder DD upsamples and maps F~st+\widetilde{F}_{s\to t}^+ back to the image domain, yielding the stylized source x~s=D(F~st+)\tilde{x}_s = D(\widetilde{F}_{s\to t}^+) (Do-Tran et al., 21 Nov 2025).

3. Class-Aware Prompting and Supervision

Class-aware prompting operates as follows:

  • Pseudo-Label Assignment: Each target image xtx_t is pseudo-labeled using the frozen black-box network (BDM), yielding a hard class assignment y^t{1,...,C}\hat{y}_t \in \{1, ..., C\}.
  • Prompt Assignment: During prompt learning, each FtF_t is associated with its corresponding Py^tP_{\hat{y}_t}. For stylized source features, the prompt scoring enforces alignment toward the same prototype.
  • Prompt Losses:

    • Direction Loss: Encourages prompt correlation vector a=σ(Ef(Z)Ep(P))a = \sigma(E_f(Z) E_p(P)^\top) to align with the one-hot encoding of y^t\hat{y}_t:

    Ldir=1Cc=1C[y^clogac+(1y^c)log(1ac)]\mathcal{L}_{\mathrm{dir}} = -\frac{1}{C}\sum_{c=1}^C [\hat{y}_c \log a_c + (1-\hat{y}_c)\log(1-a_c)] - Supervised Prompt Loss: After prompt addition and a classifier head HH:

    p=softmax(H(Ft+Py^t))Lsup=c=1Cy^clogpcp = \mathrm{softmax}(H(F_t + P_{\hat{y}_t})) \qquad \mathcal{L}_{\mathrm{sup}} = -\sum_{c=1}^C \hat{y}_c \log p_c

These operate in tandem to steer features toward the semantic boundary of the BDM (Do-Tran et al., 21 Nov 2025).

4. Objective Functions and Training Protocol

The overall loss for joint training is:

Ltotal=λdirLdir+λsupLsup+λcLc+λsLs\mathcal{L}_{\text{total}} = \lambda_{\text{dir}}\mathcal{L}_{\text{dir}} + \lambda_{\text{sup}}\mathcal{L}_{\text{sup}} + \lambda_{c}\mathcal{L}_c + \lambda_{s}\mathcal{L}_s

with all λ\lambda weights set to 1 in reported experiments. Key loss components include:

  • Content Loss Lc\mathcal{L}_c: Ensures source structure preservation by minimizing the 2\ell_2 distance between Es(xs)E_s(x_s) and Es(x~s)E_s(\tilde{x}_s).
  • Style Loss Ls\mathcal{L}_s: Matches Gram matrices (texture) at decoder layers for x~s\tilde{x}_s and a randomly sampled FtF_t.
  • Prompt Losses Ldir\mathcal{L}_{\text{dir}}, Lsup\mathcal{L}_{\text{sup}}: Enforce semantic alignment as described above.

Network details: patch size P=8P=8, token dimension d=512d=512, three ViT blocks per encoder, Adam optimizer with learning rate 5×1045\times 10^{-4}, batch size 8, and 50K training iterations. Decoder architecture comprises three upsampling plus convolution layers (Do-Tran et al., 21 Nov 2025).

5. Black-Box Inference Constraint

A defining feature of UI-Styler is its strict integration with a frozen downstream black-box model:

  • No Gradient or Logit Access: Prompt targets are derived exclusively from BDM predictions; no end-to-end gradient information or confidence scores are available.
  • Semantic Regularization: The only constraint to maintain diagnostic accuracy is encoded in the prompt mechanism, which guides the stylized features toward the correct BDM decision boundary.
  • Inference Protocol: During deployment, stylized images are directly evaluated by the same frozen BDM for classification or segmentation without any model retraining (Do-Tran et al., 21 Nov 2025).

6. Experimental Evaluation

The experimental setup utilizes four breast-ultrasound datasets—BUSBRA, BUSI, UCLM, and UDIAT—organized into 12 cross-domain adaptation tasks (pairwise, with 70/30 train/test splits). Evaluation metrics include:

  • Distribution Alignment: Kernel Inception Distance (KID) between stylized source and target images.
  • Classification Performance: Accuracy and AUC on the black-box downstream model.
  • Segmentation Performance: Dice coefficient and IoU using SAMUS segmentation on stylized images.

UI-Styler outperforms state-of-the-art UIT baselines (TransColor, S2WAT, Mamba-ST), attaining the lowest KID in 10 out of 12 tasks. Gains in downstream metrics are substantial: classification accuracy increases by 2–12 percentage points, Dice coefficient by 1–3 points. For instance, on UCLM→BUSI, accuracy improves from 75.0% (prior best) to 80.0%, and Dice from 77.11 to 80.22. Feature ablation isolates PM and CP contributions: PM alone reduces KID by 20–40% and boosts AUC by 5–8 points; adding CP yields an additional +3–5 points in accuracy and +1–2 in Dice. t-SNE analysis reveals PM reduces domain gap but leaves class clusters at the BDM decision boundary, while CP aligns samples more distinctly to their correct classes. Stylized images processed by UI-Styler show higher BDM confidence (median >0.8) with reduced prediction variance (Do-Tran et al., 21 Nov 2025).

7. Practical Considerations and Limitations

Key implementation considerations:

  • Patch-based ViT encoding (P=8P=8), high-dimensional token embeddings (d=512d=512), Adam optimizer, and convolutional upsampling decoder.
  • Prompts are global class prototypes; extending to multi-scale or spatially localized prompts is suggested for finer structure preservation.
  • Although demonstrated for binary breast-ultrasound tasks, UI-Styler generalizes to multi-class and other imaging modalities by increasing CC and the prompt bank's capacity (Do-Tran et al., 21 Nov 2025).

Limiting factors include potential insufficient granularity in global prompting and reliance on the pseudo-label quality of the BDM, as no true labels for the target domain are leveraged. Extensions could include multi-scale prompting and broader modality adaptation.


For related research on non-medical UI style transfer and citizen-led personalization in user interface design, see "ImagineNet: Restyling Apps Using Neural Style Transfer" (Fischer et al., 2020) and "Citizen-Led Personalization of User Interfaces" (Alves et al., 2024). However, UI-Styler remains unique in its class-aware, black-box-constrained UIT approach tailored for diagnostic imaging.

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to UI-Styler.