Papers
Topics
Authors
Recent
Search
2000 character limit reached

Multi-Task Learning for Front-End Text Processing in TTS

Published 12 Jan 2024 in cs.CL | (2401.06321v1)

Abstract: We propose a multi-task learning (MTL) model for jointly performing three tasks that are commonly solved in a text-to-speech (TTS) front-end: text normalization (TN), part-of-speech (POS) tagging, and homograph disambiguation (HD). Our framework utilizes a tree-like structure with a trunk that learns shared representations, followed by separate task-specific heads. We further incorporate a pre-trained LLM to utilize its built-in lexical and contextual knowledge, and study how to best use its embeddings so as to most effectively benefit our multi-task model. Through task-wise ablations, we show that our full model trained on all three tasks achieves the strongest overall performance compared to models trained on individual or sub-combinations of tasks, confirming the advantages of our MTL framework. Finally, we introduce a new HD dataset containing a balanced number of sentences in diverse contexts for a variety of homographs and their pronunciations. We demonstrate that incorporating this dataset into training significantly improves HD performance over only using a commonly used, but imbalanced, pre-existing dataset.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (35)
  1. Richard Sproat et al., “Normalization of Non-Standard Words,” Computer Speech & Language, vol. 15, no. 3, pp. 287–333, 2001.
  2. “Adaptive Latency for Part-of-Speech Tagging in Incremental Text-to-Speech Synthesis,” in Proc. Interspeech, 2016, pp. 2846–2850.
  3. David Yarowsky, “Homograph Disambiguation in Text-to-Speech Synthesis,” in Progress in Speech Synthesis, pp. 157–172. Springer, 1997.
  4. “Joint-Sequence Models for Grapheme-to-Phoneme Conversion,” Speech Communication, vol. 50, no. 5, pp. 434–451, 2008.
  5. “RNN Approaches to Text Normalization: A Challenge,” arXiv preprint arXiv:1611.00068, 2016.
  6. Hao Zhang et al., “Neural Models of Text Normalization for Speech Applications,” Computational Linguistics, vol. 45, no. 2, pp. 293–337, 2019.
  7. Courtney Mansfield et al., “Neural Text Normalization with Subword Units,” in Proceedings of NAACL-HLT, 2019, pp. 190–196.
  8. “Improving Homograph Disambiguation with Supervised Machine Learning,” in Proceedings of LREC, 2018.
  9. “Homograph Disambiguation with Contextual Word Embeddings for TTS Systems,” in Interspeech Workshop on Speech Synthesis (SSW11), 2021.
  10. “Scalable Multilingual Frontend for TTS,” in Proceedings of ICASSP. IEEE, 2020, pp. 6684–6688.
  11. Rich Caruana, “Multitask Learning,” Machine Learning, vol. 28, pp. 41–75, 1997.
  12. Zhenzhong Lan et al., “ALBERT: A Lite BERT for Self-supervised Learning of Language Representations,” in Proceedings of ICLR, 2019.
  13. Hugo Touvron et al., “Llama 2: Open Foundation and Fine-Tuned Chat Models,” arXiv preprint arXiv:2307.09288, 2023.
  14. Daan van Esch and Richard Sproat, “An Expanded Taxonomy of Semiotic Classes for Text Normalization,” in Proc. Interspeech, 2017, pp. 4016–4020.
  15. Richard Sproat, “Multilingual Text Analysis for Text-to-Speech Synthesis,” Natural Language Engineering, vol. 2, no. 4, pp. 369–380, 1996.
  16. Brian Roark et al., “The OpenGrm Open-Source Finite-State Grammar Software Libraries,” in Proceedings of the ACL 2012 System Demonstrations, 2012, pp. 61–66.
  17. “An RNN Model of Text Normalization,” in Proc. Interspeech, 2017, pp. 754–758.
  18. “Transformer-based Models of Text Normalization for Speech Applications,” arXiv preprint arXiv:2202.00153, 2022.
  19. “Proteno: Text Normalization with Limited Data for Fast Deployment in Text to Speech Systems,” in Proceedings of NAACL-HLT: Industry Papers, 2021, pp. 72–79.
  20. Michael Collins, “Discriminative Training Methods for Hidden Markov Models: Theory and Experiments with Perceptron Algorithms,” in Proceedings of EMNLP, 2002, pp. 1–8.
  21. “A Challenge Set and Methods for Noun-Verb Ambiguity,” in Proceedings of EMNLP, 2018, pp. 2562–2572.
  22. Bernd Bohnet et al., “Morphosyntactic Tagging with a Meta-BiLSTM Model over Context Sensitive Token Encodings,” in Proceedings of ACL, 2018, pp. 2642–2652.
  23. “Multi-Task Learning for Sequence Tagging: An Empirical Study,” in Proceedings of COLING, 2018, pp. 2965–2977.
  24. “A Unified Architecture for Natural Language Processing: Deep Neural Networks with Multitask Learning,” in Proceedings of ICML. PMLR, 2008, pp. 160–167.
  25. “Multi-Task Deep Neural Networks for Natural Language Understanding,” in Proceedings of ACL, 2019, pp. 4487–4496.
  26. “Understanding and Improving Information Transfer in Multi-Task Learning,” in Proceedings of ICLR, 2019.
  27. “A Text Editing Approach to Joint Japanese Word Segmentation, POS Tagging, and Lexical Normalization,” in Proceedings of the Seventh Workshop on Noisy User-Generated Text (W-NUT), 2021, pp. 67–80.
  28. “Joint POS Tagging and Text Normalization for Informal Text,” in Proceedings of IJCAI, 2015.
  29. “A Unified Front-End Framework for English Text-to-Speech Synthesis,” arXiv preprint arXiv:2305.10666, 2023.
  30. Ashish Vaswani et al., “Attention is All you Need,” Advances in Neural Information Processing Systems, vol. 30, 2017.
  31. “What Does BERT Learn about the Structure of Language?,” in Proceedings of ACL, 2019, pp. 3651–3657.
  32. Minh-Thang Luong et al., “Multi-task Sequence to Sequence Learning,” in Proceedings of ICLR, 2016.
  33. “Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift,” in Proceedings of ICML. PMLR, 2015, pp. 448–456.
  34. “Switchboard SWBD-DAMSL Shallow-Discourse-Function Annotation (Coders Manual, Draft 13),” Tech. Rep., University of Colorado, Institute of Cognitive Science, 97-02, 1997.
  35. “Decoupled Weight Decay Regularization,” in Proceedings of ICLR, 2018.
Citations (2)

Summary

No one has generated a summary of this paper yet.

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 1 tweet with 0 likes about this paper.