Papers
Topics
Authors
Recent
Search
2000 character limit reached

A novel pyramidal-FSMN architecture with lattice-free MMI for speech recognition

Published 26 Oct 2018 in cs.SD and eess.AS | (1810.11352v2)

Abstract: Deep Feedforward Sequential Memory Network (DFSMN) has shown superior performance on speech recognition tasks. Based on this work, we propose a novel network architecture which introduces pyramidal memory structure to represent various context information in different layers. Additionally, res-CNN layers are added in the front to extract more sophisticated features as well. Together with lattice-free maximum mutual information (LF-MMI) and cross entropy (CE) joint training criteria, experimental results show that this approach achieves word error rates (WERs) of 3.62% and 10.89% respectively on Librispeech and LDC97S62 (Switchboard 300 hours) corpora. Furthermore, Recurrent neural network LLM (RNNLM) rescoring is applied and a WER of 2.97% is obtained on Librispeech.

Citations (15)

Summary

Paper to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Authors (3)

Collections

Sign up for free to add this paper to one or more collections.