Phones and smart home devices can now understand fairly complex commands, thanks to self-teaching recurrent neural nets and other recent innovations. However, the task of providing accurate transcriptions of long blocks of human conversation remains beyond the abilities of even today's most advanced software.
"If you have people transcribe conversational speech over the telephone, the error rate is around 4%," says Microsoft's Xuedong Huang. "If you put all the systems together--IBM and Google and Microsoft and all the best combined--amazingly the error rate will be around 8%."
Part of the long-standing problem with speech technology is companies' attempting to determine how to make money off of it, and Massachusetts Institute of Technology researcher Jim Glass says this remains an as-yet unanswered question.
Researchers say functional transcription is only a matter of time, although the amount of time remains a very open question.
Nevertheless, researchers say the technology has the potential to unlock vast archives of oral histories, make podcasts easier to consume for speed readers, and be a world-changing boon for journalists everywhere.
View Full Article
Abstracts Copyright © 2016 Information Inc., Bethesda, Maryland, USA
No entries found