Papers
Topics
Authors
Recent
Search
2000 character limit reached

Analysing Shortcomings of Statistical Parametric Speech Synthesis

Published 28 Jul 2018 in eess.AS and cs.SD | (1807.10941v1)

Abstract: Output from statistical parametric speech synthesis (SPSS) remains noticeably worse than natural speech recordings in terms of quality, naturalness, speaker similarity, and intelligibility in noise. There are many hypotheses regarding the origins of these shortcomings, but these hypotheses are often kept vague and presented without empirical evidence that could confirm and quantify how a specific shortcoming contributes to imperfections in the synthesised speech. Throughout speech synthesis literature, surprisingly little work is dedicated towards identifying the perceptually most important problems in speech synthesis, even though such knowledge would be of great value for creating better SPSS systems. In this book chapter, we analyse some of the shortcomings of SPSS. In particular, we discuss issues with vocoding and present a general methodology for quantifying the effect of any of the many assumptions and design choices that hold SPSS back. The methodology is accompanied by an example that carefully measures and compares the severity of perceptual limitations imposed by vocoding as well as other factors such as the statistical model and its use.

Citations (3)

Summary

No one has generated a summary of this paper yet.

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.