Papers
Topics
Authors
Recent
Search
2000 character limit reached

Variational Connectionist Temporal Classification for Order-Preserving Sequence Modeling

Published 21 Sep 2023 in cs.LG | (2309.11983v3)

Abstract: Connectionist temporal classification (CTC) is commonly adopted for sequence modeling tasks like speech recognition, where it is necessary to preserve order between the input and target sequences. However, CTC is only applied to deterministic sequence models, where the latent space is discontinuous and sparse, which in turn makes them less capable of handling data variability when compared to variational models. In this paper, we integrate CTC with a variational model and derive loss functions that can be used to train more generalizable sequence models that preserve order. Specifically, we derive two versions of the novel variational CTC based on two reasonable assumptions, the first being that the variational latent variables at each time step are conditionally independent; and the second being that these latent variables are Markovian. We show that both loss functions allow direct optimization of the variational lower bound for the model log-likelihood, and present computationally tractable forms for implementing them.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (25)
  1. “Speech recognition with deep recurrent neural networks,” in Proc. International Conference on Acoustics, Speech, and Signal Processing. IEEE, 2013, pp. 6645–6649.
  2. “Enhancing code-switching speech recognition with interactive language biases,” arXiv preprint arXiv:2309.16953, 2023.
  3. “Conditional variational autoencoder with adversarial learning for end-to-end text-to-speech,” in Proc. International Conference on Machine Learning. PMLR, 2021, pp. 5530–5540.
  4. “Variational connectionist temporal classification,” in Proc. European Conference on Computer Vision. Springer, 2020, pp. 460–476.
  5. “Unsupervised cross-modal alignment of speech and text embedding spaces,” Advances in Neural Information Processing Systems, vol. 31, 2018.
  6. “Neural machine translation by jointly learning to align and translate,” in Proc. International Conference on Learning Representations, 2015.
  7. “Joint ctc-attention based end-to-end speech recognition using multi-task learning,” in Proc. International Conference on Acoustics, Speech and Signal Processing. IEEE, 2017, pp. 4835–4839.
  8. Jinyu Li et al., “Recent advances in end-to-end automatic speech recognition,” APSIPA Transactions on Signal and Information Processing, vol. 11, no. 1, 2022.
  9. “Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks,” in Proc. International Conference on Machine learning, 2006, pp. 369–376.
  10. A. Graves, “Sequence transduction with recurrent neural networks,” Computer Science, vol. 58, no. 3, pp. 235–242, 2012.
  11. “Order-Preserving Abstractive Summarization for Spoken Content Based on Connectionist Temporal Classification,” in Proc. Interspeech, 2017, pp. 2899–2903.
  12. “Auto-encoding variational bayes,” in Proc. International Conference on Machine learning, 2014.
  13. “Learning structured output representation using deep conditional generative models,” Advances in Neural Information Processing Systems, vol. 28, 2015.
  14. “Listen, attend and spell,” arXiv preprint arXiv:1508.01211, 2015.
  15. “Attention-based models for speech recognition,” Advances in Neural Information Processing Systems, vol. 28, 2015.
  16. “Listen, attend and spell: A neural network for large vocabulary conversational speech recognition,” in Proc. International Conference on Acoustics, Speech, and Signal Processing. IEEE, 2016, pp. 4960–4964.
  17. “Exploring pre-training with alignments for rnn transducer based end-to-end speech recognition,” in Proc. International Conference on Acoustics, Speech, and Signal Processing. IEEE, 2020, pp. 7079–7083.
  18. “wav2vec 2.0: A framework for self-supervised learning of speech representations,” Advances in Neural Information Processing Systems, vol. 33, pp. 12449–12460, 2020.
  19. “A recurrent latent variable model for sequential data,” Advances in Neural Information Processing Systems, vol. 28, 2015.
  20. “Variational attention for sequence-to-sequence models,” in Proc. International Conference on Computational Linguistics, 2018, pp. 1672–1682.
  21. “A transformer-based hierarchical variational autoencoder combined hidden markov model for long text generation,” Entropy, vol. 23, no. 10, pp. 1277, 2021.
  22. “Deep recurrent generative decoder for abstractive text summarization,” in Proc. Conference on Empirical Methods in Natural Language Processing, 2017, pp. 2091–2100.
  23. “Deep graph random process for relational-thinking-based speech recognition,” in Proc. International Conference on Machine Learning. PMLR, 2020, pp. 4531–4541.
  24. “A survey on bayesian deep learning,” ACM computing surveys (csur), vol. 53, no. 5, pp. 1–37, 2020.
  25. Paul A Gagniuc, Markov chains: from theory to implementation and experimentation, John Wiley & Sons, 2017.
Citations (2)

Summary

No one has generated a summary of this paper yet.

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.