Magnitude and system dependence of embedding shifts between originals and clones across accents
Determine the magnitude of shifts in deep speaker-embedding space between original utterances and their voice-cloned counterparts, and ascertain whether these shifts differ between heavily accented Mandarin speech and socially standard Mandarin speech and whether they depend on the choice of voice-cloning system.
References
However, it is not clear how large these shifts are, whether they differ for accented versus standard speech, or whether they depend on the voice-cloning system.
— Acoustic and perceptual differences between standard and accented Chinese speech and their voice clones
(2604.01562 - Yang et al., 2 Apr 2026) in Introduction (Section 1)