acm-header
Sign In

Communications of the ACM

ACM TechNews

Why Our Crazy-Smart AI Still Sucks at Transcribing Speech


Artist's conception of a digital representation of speech.

The task of providing accurate transcriptions of long blocks of actual human conversation remains beyond the abilities of todays most advanced software.

Credit: Then One/WIRED

Phones and smart home devices can now understand fairly complex commands, thanks to self-teaching recurrent neural nets and other recent innovations. However, the task of providing accurate transcriptions of long blocks of human conversation remains beyond the abilities of even today's most advanced software.

"If you have people transcribe conversational speech over the telephone, the error rate is around 4%," says Microsoft's Xuedong Huang. "If you put all the systems together--IBM and Google and Microsoft and all the best combined--amazingly the error rate will be around 8%."

Part of the long-standing problem with speech technology is companies' attempting to determine how to make money off of it, and Massachusetts Institute of Technology researcher Jim Glass says this remains an as-yet unanswered question.

Researchers say functional transcription is only a matter of time, although the amount of time remains a very open question.

Nevertheless, researchers say the technology has the potential to unlock vast archives of oral histories, make podcasts easier to consume for speed readers, and be a world-changing boon for journalists everywhere.

From Wired
View Full Article

 

Abstracts Copyright © 2016 Information Inc., Bethesda, Maryland, USA


 

No entries found

Sign In for Full Access
» Forgot Password? » Create an ACM Web Account