The key to improving speech recognition accuracy is simply mixing all available speech datasets together to train one large AI model, according to a recent study by a team of researchers affiliated with Google Research and Google Brain. They claim an AI model named SpeechStew that was trained on a range of speech corpora achieves state-of-the-art or near-state-of-the-art results on a variety of speech recognition benchmarks.
They describe their work in "SpeechStew: Simply Mix All Available Speech Recognition Data to Train One Large Neural Network."
In pursuit of a solution, the Google researchers combined all available labeled and unlabelled speech recognition data curated by the community over the years. They tested a general-purpose SpeechStew model on a number of benchmarks and found that it not only outperformed previously developed models but demonstrated an ability to adapt to challenging new tasks.
View Full Article
No entries found