Sign In

Communications of the ACM

ACM TechNews

Google's Deepmind AI Can Lip-Read Tv Shows Better Than a Pro


View as: Print Mobile App Share: Send by email Share on reddit Share on StumbleUpon Share on Hacker News Share on Tweeter Share on Facebook
Reading lips on television.

Researchers are using deep-learning techniques to create an enhanced lip-reading system.

Credit: Getty Images

Researchers at Google's DeepMind and the University of Oxford are applying deep-learning techniques to a massive dataset of BBC TV programs to create a lip-reading system that can perform better than professional lip readers.

The artificial intelligence (AI) system was trained using 5,000 hours from six TV programs that aired between January 2010 and December 2015. The TV clips' audio and video streams were sometimes out of sync, so a computer system was taught the correct links between sounds and mouth shapes to prepare the dataset for the study. Using this information, the system determined how much the streams were out of sync and realigned them.

The AI's lip-reading performance was then tested on TV programs broadcast between March and September 2016, accurately deciphering 46.8% of all words without any errors. In comparison, a professional lip reader deciphered just 12.4% of words correctly in a dataset of 200 clips. Many of the AI's errors were small, such as missing an "s" at the end of the word.

Researchers believe automatic lip readers could have significant practical potential, with applications ranging from improved hearing aids to speech recognition in loud environments.

From New Scientist
View Full Article

 

Abstracts Copyright © 2016 Information Inc., Bethesda, Maryland, USA


 

No entries found