Papers
Topics
Authors
Recent
Search
2000 character limit reached

Lip Reading Using Convolutional Auto Encoders as Feature Extractor

Published 31 May 2018 in cs.CV | (1805.12371v1)

Abstract: Visual recognition of speech using the lip movement is called Lip-reading. Recent developments in this nascent field uses different neural networks as feature extractors which serve as input to a model which can map the temporal relationship and classify. Though end to end sentence level Lip-reading is the current trend, we proposed a new model which employs word level classification and breaks the set benchmarks for standard datasets. In our model we use convolutional autoencoders as feature extractors which are then fed to a Long short-term memory model. We tested our proposed model on BBC's LRW dataset, MIRACL-VC1 and GRID dataset. Achieving a classification accuracy of 98% on MIRACL-VC1 as compared to 93.4% of the set benchmark (Rekik et al., 2014). On BBC's LRW the proposed model performed better than the baseline model of convolutional neural networks and Long short-term memory model (Garg et al., 2016). Showing the features learned by the models we clearly indicate how the proposed model works better than the baseline model. The same model can also be extended for end to end sentence level classification.

Citations (10)

Summary

No one has generated a summary of this paper yet.

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.