Multi-task Recurrent Model for Speech and Speaker Recognition

Published 31 Mar 2016 in cs.CL, cs.LG, cs.NE, and stat.ML | (1603.09643v4)

Abstract: Although highly correlated, speech and speaker recognition have been regarded as two independent tasks and studied by two communities. This is certainly not the way that people behave: we decipher both speech content and speaker traits at the same time. This paper presents a unified model to perform speech and speaker recognition simultaneously and altogether. The model is based on a unified neural network where the output of one task is fed to the input of the other, leading to a multi-task recurrent network. Experiments show that the joint model outperforms the task-specific models on both the two tasks.